Memory & Context

Context Stuffing

The bad practice of filling the context window with everything potentially useful instead of selectively retrieving only the relevant subset.

Last updated: April 26, 2026

Definition

Context stuffing is the anti-pattern where engineers, fearful of missing relevant information, dump everything into the context window: full database tables, entire conversation histories, all available tools, every chunk that scored above a low retrieval threshold. The result: high token cost, slow time-to-first-token, and surprisingly worse output quality. Frontier models attend less reliably to information far from the question, so a 100K-token stuffed context produces worse answers than a curated 10K-token context.

The cure is curation. For RAG, retrieve more candidates than you need (top 50), rerank to top 5 or 10, then send. For tools, expose only the tools relevant to the current task class, not all 50 you have ever defined. For conversation history, summarize old turns instead of pasting them verbatim. Every byte you put in the context window competes for the model's attention with every other byte. Less is almost always more.

When To Use

Treat context stuffing as a real bug. When agent quality regresses, check the size and signal-to-noise of what you are putting in the context before blaming the model.

Sources

Related Terms

Context Window

The maximum number of tokens (input + output) a model can process in a single re…

Context Engineering

The discipline of designing what an LLM sees at inference time, including system…

RAG (Retrieval-Augmented Generation)

Fetching relevant documents at query time and injecting them into the LLM prompt…

Reranking

A second-stage model that re-orders retrieved chunks by true relevance, not just…

Worried about Context Stuffing in production?

I've debugged and defended against this in real production AI systems. If you want a second pair of eyes on your architecture or your guardrails, that's what I do.

Book a discovery call Browse more terms