Safety & Guardrails

Guardrails

Input and output filters that prevent unsafe, off-topic, or out-of-policy model behavior.

Last updated: April 26, 2026

Definition

Guardrails are the safety net around your LLM. Input guardrails block prompt injection, off-topic queries, and PII leakage before reaching the model. Output guardrails check generated text for harmful content, hallucinations, or policy violations before reaching the user. Implementations range from regex filters to dedicated services like AWS Bedrock Guardrails (which adds content filters, denied topics, prompt attack detection, and PII redaction). Production systems always have at least input validation and output sanitization.

When To Use

Required for any production agent. Skipping guardrails is the #1 cause of public AI incidents.

Related Terms

Prompt Injection

An attack where user input contains instructions that hijack the LLM's behavior.…

Building with Guardrails?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.

Book a discovery call Browse more terms