Production

Drift Detection

Monitoring for gradual degradation in agent quality over time, typically caused by changing input distribution, model updates, or memory drift.

Last updated: April 26, 2026

Definition

Drift detection in LLM systems watches for slow regressions you would miss in a snapshot test. Three drift sources matter most. First, data drift: the kinds of questions users ask shift over time and the agent starts hitting cases it was never tuned for. Second, model drift: the provider ships a silent update and behavior changes. Third, memory drift: the agent's long-term memory accumulates contradictory or stale entries. The detection mechanism is usually a rolling eval: periodically (daily, hourly) re-run a fixed eval set and alert when the score drops below a threshold.

When To Use

Set up drift monitoring as part of LLMOps. The single most useful metric: success rate on your golden eval set, tracked daily. Alert on any 5 percent drop.

Sources

Arize: ML observability docs

Related Terms

Eval Harness

A test suite that runs the model against a fixed set of inputs and grades output…

Observability

Logging, metrics, and tracing for LLM calls so you can debug, audit, and optimiz…

Memory Drift

The gradual degradation of agent behavior as in-context memory accumulates stale…

LLMOps

The MLOps equivalent for LLM-powered systems: prompt versioning, evaluation pipe…

Building with Drift Detection?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.

Book a discovery call Browse more terms