Memory & Context

Summary Memory

Memory pattern that compresses older conversation turns into a running summary, keeping the working set small as conversations grow.

Last updated: April 26, 2026

Definition

Summary memory replaces older verbatim turns with a model-generated running summary. As the conversation grows, periodically (every N turns or every M tokens) the agent calls a separate LLM to summarize the oldest turns and replaces them in the message array with the summary. The recent N turns stay verbatim; everything older becomes a 200-word summary. The benefit: conversations of unlimited length still fit in context. The cost: information loss in the summarization step (specific quotes and details get lost).

Hybrid memory is what most production systems use. Recent turns verbatim (last 10), summary of older turns (compressed monthly), plus episodic memory of named entities and decisions (stored separately). This combines the precision of verbatim memory for recent context with the unbounded scaling of summarization for ancient history. The summarization prompt matters: prompt the summarizer to preserve specific named entities, numbers, decisions, and commitments while compressing prose.

When To Use

Use summary memory for any conversation that may exceed 20 turns. Tune the summary frequency to balance information loss against context-window pressure.

Sources

LangChain: ConversationSummaryMemory

Related Terms

Buffer Memory

Short-term memory pattern that stores the full recent conversation history verba…

Short-Term Memory

The conversation history kept in the LLM's context window during a single sessio…

Context Window

The maximum number of tokens (input + output) a model can process in a single re…

Episodic Memory

Memory of specific past events, conversations, and interactions, allowing the ag…

Building with Summary Memory?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.

Book a discovery call Browse more terms