Reranking
A second-stage model that re-orders retrieved chunks by true relevance, not just embedding similarity.
Last updated: April 26, 2026
Definition
Initial retrieval (vector search) optimizes for recall. Get many candidate chunks fast. Reranking optimizes for precision. Pick the actually relevant ones. A reranker is a smaller cross-encoder model (Cohere Rerank, Voyage Rerank, or open-source bge-reranker) that scores each chunk-query pair more accurately than embeddings alone. Typical pattern: retrieve top 50 by vector similarity, rerank to top 5, send to LLM. The cost is one extra API call per query; the quality lift is usually 10-30 percent on retrieval benchmarks.
Code Example
# Two-stage retrieval
candidates = vector_store.search(query, top_k=50)
reranked = cohere.rerank(
query=query, documents=[c.text for c in candidates], top_n=5,
)
top_chunks = [candidates[r.index] for r in reranked.results]Retrieve broadly, rerank narrowly. The reranker pays for itself in answer quality.
When To Use
Add reranking when retrieval quality plateaus. Most production RAG systems have it; prototypes usually do not.
Related Terms
Building with Reranking?
I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.