How we contain Claude across products

Anthropic describes how it contains increasingly capable Claude agents by matching product architecture to threat model: server-side containers for claude.ai, OS sandboxes for Claude Code, and VM isolation for Claude Cowork.

Argument Map

Three risk categories

User misuse, model misbehavior, and external attackers each require overlapping defenses.

Three defense components

The environment, the model, and external content are defended with different mechanisms and guarantees.

Pattern 3: local VM

Claude Cowork uses a VM boundary for code execution and host filesystem exposure controlled by mount modes.

Looking ahead

Persistent memory poisoning, multi-agent trust escalation, and agent identity are identified as evolving risks.

Summary principles

Contain first, match isolation to user expertise, and prefer battle-tested primitives over custom components.

Products and Containment Patterns

claude.ai

Server-side Claude product that runs code in ephemeral gVisor containers on isolated infrastructure.

Claude Code

Developer agent that runs on a user's machine with filesystem, shell, and network access.

Claude Cowork

Knowledge-work agent that uses a local VM to isolate code execution and file access.

Ephemeral Container

Per-session server-side container pattern used for claude.ai code execution.

Local VM

Virtual-machine isolation pattern used by Claude Cowork.

Model Context Protocol

Tool and connector protocol whose local and remote deployments carry trust and prompt-injection implications.

Agent Identity

Open design question about whether agents should have their own principal identity or inherit user permissions.

FAQ

What problem does the article address?

It explains how Anthropic caps the blast radius of increasingly capable Claude agents across multiple products.

What are the three risk categories?

The categories are user misuse, model misbehavior, and external attackers.

What are the three defense components?

The article highlights the environment, the model, and the external content the agent can reach.

Why is human-in-the-loop approval insufficient by itself?

Users approve many prompts, become fatigued, and may miss harmful actions, so probabilistic or attention-based oversight cannot stand alone.

What is the central role of containment?

Containment limits what the agent is able to reach or do through sandboxes, VMs, filesystem boundaries, and egress controls.

How does claude.ai contain code execution?

It runs code server-side in isolated gVisor containers with ephemeral per-session filesystems.

How does Claude Code differ from claude.ai?

Claude Code runs on a user's machine and needs shell, filesystem, and network access, so it relies on approvals plus OS-level sandboxing.

Why does Claude Cowork use a VM?

Cowork targets general knowledge workers, so it uses an always-on VM boundary rather than expecting users to judge low-level commands.

What lesson came from the allowlist incident?

A destination allowlist is also a capability grant; every reachable function on an allowed domain becomes part of the attack surface.

What risks does Anthropic identify as next?

Persistent memory poisoning, multi-agent trust escalation, and cross-platform agent identity are highlighted as future concerns.

Glossary

Blast Radius

Maximum possible damage from an agent failure or compromise.

Containment

Hard environment-level limits on what an agent can access or affect.

Egress Controls

Network rules that restrict data leaving an execution environment.

Prompt Injection

Malicious instructions embedded in content that the agent reads.

MCP

Protocol and ecosystem for connecting agents to tools and data sources.

Agent Identity

The authorization model that determines whether an agent acts as itself, as a user, or both.

Knowledge Graph Explorer

0 nodes / 0 links

Explore Knowledge Graph using SPARQL

Result format: text/x-html+tr for SELECT, text/x-html-nice-turtle for DESCRIBE/CONSTRUCT.
Run live query