Jahanzaib
RAG & Retrieval

Agentic RAG

RAG where the LLM autonomously decides what to retrieve and when, instead of one-shot retrieve-then-generate.

Last updated: April 26, 2026

Definition

Agentic RAG turns the agent loose on its own retrieval. Instead of the system retrieving once before generation, the agent decides whether retrieval is needed, what query to issue, whether to follow up with another query, and when it has enough information to answer. The agent uses retrieval as just another tool. AWS Bedrock made this generally available in 2026. The benefit: dramatically better answers on multi-hop questions ("how does X compare to Y, and which has been more affected by recent regulation?"). The cost: 3 to 5x more retrieval calls per question.

Agentic RAG only pays off when the workload genuinely has multi-hop questions. For single-fact lookup workloads ("what is our refund policy?") agentic RAG is wasteful: the agent does extra retrievals that did not help. The right pattern is conditional: classify the incoming question first (single-hop vs multi-hop), route single-hop to naive or advanced RAG, route multi-hop to agentic RAG. Customers feel the latency improvement on simple questions and the quality improvement on complex ones.

When To Use

Use agentic RAG when your queries are routinely multi-hop or comparative. Skip for high-volume single-fact workloads where the cost outweighs the benefit.

Sources

Related Terms

Building with Agentic RAG?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.