Trust Boundary
The line in your agent system between trusted inputs (system prompt, internal config) and untrusted inputs (user messages, retrieved documents, tool outputs).
Last updated: April 26, 2026
Definition
A trust boundary is where data crosses from a trusted source to a less-trusted one. For LLM agents, the system prompt and your own configured tool definitions are trusted. Everything else, user messages, retrieved RAG content, web pages the agent fetches, results from tool calls, output from sub-agents, is untrusted. The model has no built-in way to distinguish trusted from untrusted text inside the same context window: they are all just tokens. The defense is structural: wrap untrusted content in clearly-delimited tags, instruct the model to treat tagged content as data not instructions, and never let the model take a high-stakes action based purely on untrusted content.
When To Use
Map every trust boundary in your agent's data flow. Wherever untrusted content enters the prompt, mark it explicitly. Audits should walk these boundaries.
Related Terms
Building with Trust Boundary?
I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.