Context Stuffing
The bad practice of filling the context window with everything potentially useful instead of selectively retrieving only the relevant subset.
Last updated: April 26, 2026
Definition
Context stuffing is the anti-pattern where engineers, fearful of missing relevant information, dump everything into the context window: full database tables, entire conversation histories, all available tools, every chunk that scored above a low retrieval threshold. The result: high token cost, slow time-to-first-token, and surprisingly worse output quality. Frontier models attend less reliably to information far from the question, so a 100K-token stuffed context produces worse answers than a curated 10K-token context.
The cure is curation. For RAG, retrieve more candidates than you need (top 50), rerank to top 5 or 10, then send. For tools, expose only the tools relevant to the current task class, not all 50 you have ever defined. For conversation history, summarize old turns instead of pasting them verbatim. Every byte you put in the context window competes for the model's attention with every other byte. Less is almost always more.
When To Use
Treat context stuffing as a real bug. When agent quality regresses, check the size and signal-to-noise of what you are putting in the context before blaming the model.
Worried about Context Stuffing in production?
I've debugged and defended against this in real production AI systems. If you want a second pair of eyes on your architecture or your guardrails, that's what I do.