Voice & Audio

Latency Budget

The allocation of acceptable delay across each stage of a voice agent pipeline so the total Time to First Audio meets the target.

Last updated: April 26, 2026

Definition

A latency budget is the per-stage allocation that keeps your total TTFA under target. For a 1000ms target you might allocate: 400ms turn detection (the largest single component, hard to shrink), 150ms STT final transcript, 300ms LLM time-to-first-token, 150ms TTS first-chunk. Sum: exactly 1000ms with no buffer. The discipline is to instrument every stage in production and watch which one breaks the budget first. The slowest stage is your optimization target. Without a budget, every stage feels equally important and nothing gets prioritized.

Two tactics consistently move latency budgets. First, keep the LLM small. Switching from Sonnet to Haiku (or GPT-5 to GPT-5 mini) typically halves the time-to-first-token at minimal quality cost for voice tasks, where responses are short and structured. Second, shrink the silence threshold for end-of-turn detection only as far as your false-trigger rate allows. Cutting from 800ms to 400ms saves 400ms on every turn but causes the agent to interrupt mid-sentence if the user pauses to think. The right value is workload-specific: faster for transactional, slower for emotional support calls.

When To Use

Define the budget before you ship. After launch, instrument and review weekly. Most voice agent latency problems are budget violations in one stage that creep in over time.

Sources

Related Terms

Time to First Audio (TTFA)

The total latency from when the user stops speaking to when the agent's first au…

Silence Threshold

The duration of detected silence (typically 300 to 800ms) that triggers the agen…

STT → LLM → TTS Pipeline

The three-stage architecture of every modern voice agent: speech to text, then l…

Turn Detection

How a voice agent decides when the caller has stopped speaking and it is the age…

Building with Latency Budget?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.

Book a discovery call Browse more terms