Evaluation

Reflexion

Reasoning pattern where the agent critiques its own previous output and iterates toward a better answer in a feedback loop.

Last updated: April 26, 2026

Definition

Reflexion (Shinn et al. 2023) adds a self-critique step to the agent loop. After producing an answer, the agent reflects on whether the answer is correct, what evidence it used, and what could be improved. The reflection becomes input to the next iteration. The pattern improves task success rates significantly on benchmarks where the agent can detect its own mistakes (code that fails to compile, plans that violate constraints, math that does not check out). It does less for tasks where the model cannot self-evaluate reliably.

Two production patterns derive from Reflexion. First, automated retry on tool failure: when a tool returns an error, feed the error back to the model and ask it to revise the call. Second, self-checking before commit: before an agent commits a destructive action (writing to DB, sending an email), ask the model to review its own decision against the original request. Both are lighter-weight than full Reflexion but capture most of the reliability benefit. The full Reflexion loop is more useful in research-style agents than in production transactional ones.

When To Use

Add a Reflexion-style self-check before any irreversible action. Use the full loop for agents that solve open-ended problems where success can be measured (code that compiles, plans that satisfy constraints).

Sources

Related Terms

Chain-of-Thought (CoT)

Prompting technique where the model writes out intermediate reasoning steps befo…

Tree-of-Thought (ToT)

Reasoning pattern that explores multiple branches at each step, evaluates them, …

Self-Correction

An agent's ability to detect errors in its own outputs and revise them without e…

Eval Harness

A test suite that runs the model against a fixed set of inputs and grades output…

Building with Reflexion?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.

Book a discovery call Browse more terms