Safety & Guardrails

Sandboxing

Isolating agent execution in a restricted environment so security failures cannot affect the rest of the system.

Last updated: April 26, 2026

Definition

Sandboxing means running parts of your agent in an environment where what the agent can do is structurally constrained. Common applications: code-execution tools that run in ephemeral containers with no network access, file-write tools that operate in a per-session temp directory, browser-use agents in headless browsers without persistent cookies. The principle: if the agent does something wrong, the blast radius is the sandbox, not your production system. Sandboxing is the layer that catches failures the prompt-level guardrails miss.

When To Use

Required for any agent with code-execution or file-write capabilities. Use ephemeral containers (Modal, E2B, Daytona) for code execution; never run model-generated code in your production environment directly.

Sources

Related Terms

Guardrails

Input and output filters that prevent unsafe, off-topic, or out-of-policy model …

Least Privilege (Agent)

Security principle that an agent should have only the minimum permissions, tools…

Action Space

The complete set of actions an agent is authorized to take, defined by its regis…

Building with Sandboxing?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.

Book a discovery call Browse more terms