Temperature
Sampling parameter that controls randomness in LLM outputs. Lower (0.0) is deterministic and focused; higher (1.0+) is diverse and creative.
Last updated: April 26, 2026
Definition
Temperature is a scaling factor applied to the model's probability distribution before sampling the next token. At temperature 0, the model always picks the highest-probability token (deterministic, repetitive). At temperature 1.0, the distribution is unchanged (default randomness). Above 1.0, the distribution is flattened, making low-probability tokens more likely (creative, sometimes incoherent). For agent work, temperature 0 is the right default: you want consistent decisions. For creative writing, 0.7 to 1.0 is typical. Temperature is the single most-tweaked sampling parameter in production.
When To Use
Set temperature 0 (or near-zero) for tool calling, classification, and structured output. Raise to 0.7+ only for creative tasks where diversity matters more than reliability.
Related Terms
Building with Temperature?
I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.