Memory & Context

Long-Term Memory

Persistent storage outside the context window. Typically a vector database, key-value store, or both.

Last updated: April 26, 2026

Definition

Long-term memory is anything that survives between sessions. Three common shapes: (1) vector store of past conversations, retrieved on demand by similarity; (2) structured KV store of user facts ("Alice prefers email summaries on Mondays"); (3) episodic logs of past actions. The hard part is not storage. It is retrieval. The model needs the right memory at the right moment, which means writing a retrieval prompt that knows what to look up.

Code Example

python

# Retrieve memories before responding
async def respond(user_id: str, message: str) -> str:
    memories = await vector_store.search(
        query=message, filter={"user_id": user_id}, top_k=5,
    )
    facts = await kv_store.get_user_facts(user_id)
    return await llm.complete(
        system=BASE_PROMPT.format(memories=memories, facts=facts),
        messages=[{"role": "user", "content": message}],
    )

Retrieve before generating. Both vector and structured memory feed the prompt.

When To Use

As soon as your agent has repeat users. Without it, every conversation starts cold and feels generic.

Related Terms

Short-Term Memory

The conversation history kept in the LLM's context window during a single sessio…

RAG (Retrieval-Augmented Generation)

Fetching relevant documents at query time and injecting them into the LLM prompt…

Vector Database

A database optimized for storing embeddings and finding nearest neighbors at sca…

Building with Long-Term Memory?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.

Book a discovery call Browse more terms