Exponential Backoff
Retry strategy that doubles the wait between attempts to give a struggling system room to recover.
Last updated: April 26, 2026
Definition
When an LLM API returns 429 (rate limited) or 5xx (transient error), retrying immediately makes things worse. Exponential backoff waits longer between each retry: 1s, 2s, 4s, 8s. Add jitter (random ±25 percent) to prevent thundering-herd retries. Cap total retry time so requests fail fast on permanent errors. The Anthropic and OpenAI SDKs have backoff built in; your job is mostly to not disable it. Custom code paths (background jobs, webhook handlers) need explicit backoff.
Code Example
import asyncio, random
async def call_with_backoff(fn, max_retries=5):
for attempt in range(max_retries):
try:
return await fn()
except RateLimitError:
wait = (2 ** attempt) + random.uniform(0, 1)
await asyncio.sleep(wait)
raise1s, 2s, 4s, 8s, 16s with jitter. Cap retries so permanent failures fail fast.
When To Use
Required on every LLM call in production. The default in official SDKs is good. Just don't disable it.
Building with Exponential Backoff?
I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.