How we contain Claude across products

Anthropic describes how it contains increasingly capable Claude agents by matching product architecture to threat model: server-side containers for claude.ai, OS sandboxes for Claude Code, and VM isolation for Claude Cowork.

Argument Map

Risk-reward shifts as agent capability grows

The article frames containment as the way to cap blast radius while preserving useful agent deployments.

Three risk categories

User misuse, model misbehavior, and external attackers each require overlapping defenses.

Three defense components

The environment, the model, and external content are defended with different mechanisms and guarantees.

Pattern 1: ephemeral container

claude.ai uses server-side gVisor containers with ephemeral filesystems and isolated infrastructure.

Pattern 2: human-in-the-loop sandbox

Claude Code combines developer approvals with OS-level sandboxes and network-deny defaults.

Pattern 3: local VM

Claude Cowork uses a VM boundary for code execution and host filesystem exposure controlled by mount modes.

Allowlist as capability grant

The article treats every function available through an allowed domain as part of the attack surface.

Trusting what the agent reads

MCPs, connectors, web content, and tool outputs require both supply-chain review and prompt-injection inspection.

Looking ahead

Persistent memory poisoning, multi-agent trust escalation, and agent identity are identified as evolving risks.

Summary principles

Contain first, match isolation to user expertise, and prefer battle-tested primitives over custom components.

What are the three risk categories?

The categories are user misuse, model misbehavior, and external attackers.

What are the three defense components?

The article highlights the environment, the model, and the external content the agent can reach.

Why is human-in-the-loop approval insufficient by itself?

Users approve many prompts, become fatigued, and may miss harmful actions, so probabilistic or attention-based oversight cannot stand alone.

What is the central role of containment?

Containment limits what the agent is able to reach or do through sandboxes, VMs, filesystem boundaries, and egress controls.

How does claude.ai contain code execution?

It runs code server-side in isolated gVisor containers with ephemeral per-session filesystems.

How does Claude Code differ from claude.ai?

Claude Code runs on a user's machine and needs shell, filesystem, and network access, so it relies on approvals plus OS-level sandboxing.

Why does Claude Cowork use a VM?

Cowork targets general knowledge workers, so it uses an always-on VM boundary rather than expecting users to judge low-level commands.

What lesson came from the allowlist incident?

A destination allowlist is also a capability grant; every reachable function on an allowed domain becomes part of the attack surface.

What risks does Anthropic identify as next?

Persistent memory poisoning, multi-agent trust escalation, and cross-platform agent identity are highlighted as future concerns.

Knowledge Graph Explorer

0 nodes / 0 links

Explore Knowledge Graph using SPARQL

Named graph Query recipe SPARQL

Result format: text/x-html+tr for SELECT, text/x-html-nice-turtle for DESCRIBE/CONSTRUCT.

Run live query

How we contain Claude across products

Argument Map

Risk-reward shifts as agent capability grows

Three risk categories

Three defense components

Pattern 1: ephemeral container

Pattern 2: human-in-the-loop sandbox

Pattern 3: local VM

Allowlist as capability grant

Trusting what the agent reads

Looking ahead

Summary principles

Products and Containment Patterns

claude.ai

Claude Code

Claude Cowork

Ephemeral Container

Human-in-the-loop supervision

Local VM

Defensive man-in-the-middle proxy

Model Context Protocol

Agent Identity

FAQ

Glossary

Blast Radius

Containment

Egress Controls

Prompt Injection

Human-in-the-loop

Approval Fatigue

MCP

Agent Identity

Knowledge Graph Explorer

Visualization settings

Explore Knowledge Graph using SPARQL