Risk-reward shifts as agent capability grows
The article frames containment as the way to cap blast radius while preserving useful agent deployments.
Anthropic describes how it contains increasingly capable Claude agents by matching product architecture to threat model: server-side containers for claude.ai, OS sandboxes for Claude Code, and VM isolation for Claude Cowork.
The article frames containment as the way to cap blast radius while preserving useful agent deployments.
User misuse, model misbehavior, and external attackers each require overlapping defenses.
The environment, the model, and external content are defended with different mechanisms and guarantees.
claude.ai uses server-side gVisor containers with ephemeral filesystems and isolated infrastructure.
Claude Code combines developer approvals with OS-level sandboxes and network-deny defaults.
Claude Cowork uses a VM boundary for code execution and host filesystem exposure controlled by mount modes.
The article treats every function available through an allowed domain as part of the attack surface.
MCPs, connectors, web content, and tool outputs require both supply-chain review and prompt-injection inspection.
Persistent memory poisoning, multi-agent trust escalation, and agent identity are identified as evolving risks.
Contain first, match isolation to user expertise, and prefer battle-tested primitives over custom components.
Server-side Claude product that runs code in ephemeral gVisor containers on isolated infrastructure.
Developer agent that runs on a user's machine with filesystem, shell, and network access.
Knowledge-work agent that uses a local VM to isolate code execution and file access.
Per-session server-side container pattern used for claude.ai code execution.
Approval-based oversight of agent actions, useful but vulnerable to fatigue.
Virtual-machine isolation pattern used by Claude Cowork.
VM proxy that validates Anthropic API requests and blocks attacker-provided keys.
Tool and connector protocol whose local and remote deployments carry trust and prompt-injection implications.
Open design question about whether agents should have their own principal identity or inherit user permissions.
The categories are user misuse, model misbehavior, and external attackers.
The article highlights the environment, the model, and the external content the agent can reach.
It runs code server-side in isolated gVisor containers with ephemeral per-session filesystems.
Maximum possible damage from an agent failure or compromise.
Hard environment-level limits on what an agent can access or affect.
Network rules that restrict data leaving an execution environment.
Malicious instructions embedded in content that the agent reads.
User approval or supervision of agent behavior.
Reduced attention caused by repeated permission prompts.
Protocol and ecosystem for connecting agents to tools and data sources.
The authorization model that determines whether an agent acts as itself, as a user, or both.
Advanced graph controls
Use background drag to pan, wheel or buttons to zoom, node drag to pin, and double-click a node to release it.
text/x-html+tr for SELECT, text/x-html-nice-turtle for DESCRIBE/CONSTRUCT.