Jahanzaib
Memory & Context

Summary Memory

Memory pattern that compresses older conversation turns into a running summary, keeping the working set small as conversations grow.

Last updated: April 26, 2026

Definition

Summary memory replaces older verbatim turns with a model-generated running summary. As the conversation grows, periodically (every N turns or every M tokens) the agent calls a separate LLM to summarize the oldest turns and replaces them in the message array with the summary. The recent N turns stay verbatim; everything older becomes a 200-word summary. The benefit: conversations of unlimited length still fit in context. The cost: information loss in the summarization step (specific quotes and details get lost).

Hybrid memory is what most production systems use. Recent turns verbatim (last 10), summary of older turns (compressed monthly), plus episodic memory of named entities and decisions (stored separately). This combines the precision of verbatim memory for recent context with the unbounded scaling of summarization for ancient history. The summarization prompt matters: prompt the summarizer to preserve specific named entities, numbers, decisions, and commitments while compressing prose.

When To Use

Use summary memory for any conversation that may exceed 20 turns. Tune the summary frequency to balance information loss against context-window pressure.

Sources

Related Terms

Building with Summary Memory?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.