Jahanzaib
Models & Training

Temperature

Sampling parameter that controls randomness in LLM outputs. Lower (0.0) is deterministic and focused; higher (1.0+) is diverse and creative.

Last updated: April 26, 2026

Definition

Temperature is a scaling factor applied to the model's probability distribution before sampling the next token. At temperature 0, the model always picks the highest-probability token (deterministic, repetitive). At temperature 1.0, the distribution is unchanged (default randomness). Above 1.0, the distribution is flattened, making low-probability tokens more likely (creative, sometimes incoherent). For agent work, temperature 0 is the right default: you want consistent decisions. For creative writing, 0.7 to 1.0 is typical. Temperature is the single most-tweaked sampling parameter in production.

When To Use

Set temperature 0 (or near-zero) for tool calling, classification, and structured output. Raise to 0.7+ only for creative tasks where diversity matters more than reliability.

Sources

Related Terms

Building with Temperature?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.