Advanced RAG
RAG enhanced with pre-retrieval (query rewriting, expansion) and post-retrieval (reranking, contextual filtering) optimization steps.
Last updated: April 26, 2026
Definition
Advanced RAG adds steps before and after the basic retrieve-and-generate flow. Pre-retrieval: rewrite the user's vague query into a more retrieval-friendly form, expand it into multiple variants, or decompose it into sub-queries. Post-retrieval: rerank candidates with a cross-encoder model, filter by metadata, deduplicate near-identical chunks, or compress long retrievals to fit the context window. Each step trades cost (more LLM calls, more compute) for retrieval quality. Advanced RAG is the standard production pattern as of 2026; naive RAG only survives in prototypes.
The two upgrades that pay back fastest in practice: query rewriting (have a small LLM call rewrite the user's question into a better retrieval query before embedding) and reranking (retrieve top 50 by vector similarity, rerank to top 5 with a cross-encoder). Together they typically improve recall@5 by 20 to 40 percent on benchmark RAG tasks. Anthropic's "Contextual Retrieval" technique (prepending a chunk-specific summary before embedding) gives another 30 to 50 percent improvement on certain workloads.
When To Use
Move from naive to advanced RAG when retrieval quality plateaus. Start with reranking; it is the highest-leverage single addition.
Related Terms
Building with Advanced RAG?
I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.