Jahanzaib
Architecture

Agent Harness

The wrapper layer around an LLM that adds the agentic loop, tool routing, memory, and observability.

Last updated: April 26, 2026

Definition

An agent harness is everything that turns a base LLM into an agent. The LLM by itself just predicts tokens. The harness wraps it with: the agentic loop (perceive, reason, act, observe), tool definitions and the dispatcher that runs them, short-term and long-term memory, error handling and retries, observability hooks, cost tracking, and guardrails. Frameworks like LangGraph, OpenAI Agents SDK, CrewAI, Pydantic AI, and Anthropic's reference implementations all ship harnesses. You can also build your own in 200 to 500 lines for a single agent.

The "harness" name comes from machine learning evaluation, where the eval harness wraps the model under test with everything needed to run it. Same pattern here. Two practical implications. First, the harness is where most production bugs live. The model usually does the right thing once you give it the right tools and context. The harness is what fails: a tool that swallows errors, a memory layer that drops context, a retry loop that exhausts the rate limit. Second, the harness is the right unit to test. Mock the LLM, run the harness against fixture conversations, assert the right tools are called.

When To Use

You always need a harness, even if you do not call it that. The choice is whether to use an off-the-shelf one (LangGraph, Agents SDK) or roll your own. For a single agent with under 10 tools, a custom 300-line harness is often clearer than a framework. For multi-agent or stateful workflows, use a framework.

Sources

Related Terms

Building with Agent Harness?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.