Jahanzaib
Models & Training

Top-P (Nucleus Sampling)

Sampling parameter that restricts token selection to the smallest set whose cumulative probability exceeds P, capping diversity at the long tail.

Last updated: April 26, 2026

Definition

Top-P (also called nucleus sampling, introduced by Holtzman et al. 2019) is an alternative to temperature for controlling output randomness. At each step, the model considers only the smallest set of tokens whose cumulative probability adds up to at least P (typically 0.9 or 0.95). Low-probability "tail" tokens are excluded entirely. The benefit over temperature: more stable behavior across very different probability distributions. Use top-P when you want diversity without the model occasionally picking obviously wrong tokens that temperature scaling can sample.

When To Use

Use top-P 0.9 to 0.95 for most production tasks. Combine with temperature 0 for the most stable outputs. Tune higher for creative diversity.

Sources

Related Terms

Building with Top-P (Nucleus Sampling)?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.