Models & Training

Temperature

Sampling parameter that controls randomness in LLM outputs. Lower (0.0) is deterministic and focused; higher (1.0+) is diverse and creative.

Last updated: April 26, 2026

Definition

Temperature is a scaling factor applied to the model's probability distribution before sampling the next token. At temperature 0, the model always picks the highest-probability token (deterministic, repetitive). At temperature 1.0, the distribution is unchanged (default randomness). Above 1.0, the distribution is flattened, making low-probability tokens more likely (creative, sometimes incoherent). For agent work, temperature 0 is the right default: you want consistent decisions. For creative writing, 0.7 to 1.0 is typical. Temperature is the single most-tweaked sampling parameter in production.

When To Use

Set temperature 0 (or near-zero) for tool calling, classification, and structured output. Raise to 0.7+ only for creative tasks where diversity matters more than reliability.

Sources

Related Terms

Top-P (Nucleus Sampling)

Sampling parameter that restricts token selection to the smallest set whose cumu…

System Prompt

Top-of-context instructions that define an agent's role, behavior, constraints, …

LLM (Large Language Model)

A neural network trained on text that predicts the next token, used as the engin…

Fine-Tuning

Continuing training on your own data to adjust the base model's behavior or know…

Building with Temperature?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.

Book a discovery call Browse more terms