Models & Training

Top-P (Nucleus Sampling)

Sampling parameter that restricts token selection to the smallest set whose cumulative probability exceeds P, capping diversity at the long tail.

Last updated: April 26, 2026

Definition

Top-P (also called nucleus sampling, introduced by Holtzman et al. 2019) is an alternative to temperature for controlling output randomness. At each step, the model considers only the smallest set of tokens whose cumulative probability adds up to at least P (typically 0.9 or 0.95). Low-probability "tail" tokens are excluded entirely. The benefit over temperature: more stable behavior across very different probability distributions. Use top-P when you want diversity without the model occasionally picking obviously wrong tokens that temperature scaling can sample.

When To Use

Use top-P 0.9 to 0.95 for most production tasks. Combine with temperature 0 for the most stable outputs. Tune higher for creative diversity.

Sources

Related Terms

Temperature

Sampling parameter that controls randomness in LLM outputs. Lower (0.0) is deter…

System Prompt

Top-of-context instructions that define an agent's role, behavior, constraints, …

LLM (Large Language Model)

A neural network trained on text that predicts the next token, used as the engin…

Building with Top-P (Nucleus Sampling)?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.

Book a discovery call Browse more terms