Production

Context Engineering

The discipline of designing what an LLM sees at inference time, including system prompt, retrieved data, tool definitions, and conversation history.

Last updated: April 26, 2026

Definition

Context engineering is the term that replaced "prompt engineering" once the field realized that the prompt is only one input. The model's actual input at inference time is everything inside the context window: system prompt, fetched documents, tool definitions, prior turns, intermediate reasoning, and any RAG retrievals. Designing all of that with the same care once reserved for the system prompt is what context engineering names. Andrej Karpathy popularized the term in late 2024, and LangChain co-founder Harrison Chase wrote the influential "Context Engineering" essay in 2025 that turned it into the standard framing for production agent design.

In practice, context engineering is the work of deciding three things per agent. First: what is in the system prompt versus what is fetched dynamically. Second: which retrieval strategy keeps the most-relevant context inside the token budget without flooding the model. Third: how previous turns are summarized, dropped, or kept verbatim as the conversation grows. Each decision has cost, latency, and quality implications. Bad context engineering is the most common cause of "the model used to work and now it does not" complaints in production. The model did not change; what you put in front of it changed.

When To Use

Treat context engineering as a first-class engineering discipline on every production agent. Document what goes into the context window and why. When quality regresses, audit the context before assuming the model is at fault.

Sources

Related Terms

Context Window

The maximum number of tokens (input + output) a model can process in a single re…

RAG (Retrieval-Augmented Generation)

Fetching relevant documents at query time and injecting them into the LLM prompt…

Short-Term Memory

The conversation history kept in the LLM's context window during a single sessio…

System Prompt

Top-of-context instructions that define an agent's role, behavior, constraints, …

Prompt Caching

Reusing previously-processed input tokens at a 90 percent cost discount and lowe…

Building with Context Engineering?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.

Book a discovery call Browse more terms