Understanding Key AI Language Model Parameters: top_p, Temperature, num_beams, and do_sample

4 min readDec 15, 2023

I am often asked, what are the parameters top_p , temperature, num_beams, and do_sample when you use large language models (LLMs) like GPT-4, Mistral, Falcon, and Orca2? More importantly, why are they key parameters that control how these models generate text, images, and audio? I am writing this article to help define these concepts for both AI enthusiasts and professionals. As AI continues to evolve (and it is changing by the minute), knowing these parameters will become increasingly important if you want to deep dive into this technology.

top_p: Tailoring Creativity with Nucleus Sampling

The top_p parameter is central to a technique known as nucleus sampling. It's a method to balance randomness and predictability in text generation.

How Does It Work? Imagine the AI model predicting the next word in a sentence. top_p sets a cumulative probability threshold. The model considers only a subset of the most likely next words whose combined probability exceeds this threshold.
A lower top_p (e.g., 0.2) means more predictable text, as only the most probable words are considered. A higher top_p (close to 1) incorporates less likely words, adding creativity but potentially reducing coherence.

Temperature: Adjusting the Probability Landscape

Temperature is another critical parameter that “fine-tunes” the randomness in text generation.

Functionality: It modifies the probability distribution of the next word. A temperature of 1 means no change. Below 1, the model becomes more conservative, favoring more likely words. Above 1, it becomes less predictable, giving lower probability words a fighting chance.
Outcome: A high temperature leads to more diverse and sometimes off-beat text, while a low temperature results in safer, more expected outputs.
A widely used method to reduce hallucinations involves using source or grounding documents as a basis for the model inference, ensuring its predictions are closely linked to and derived from these sources. However, this approach seems to involve a compromise between the diversity of the output and its accuracy in relation to the source. To address this compromise, Google research (KL-Divergence Guided Temperature Sampling) suggests loosening the fixed temperature constraint during the decoding process and employing a technique that dynamically adjusts the temperature based on its relevance to the source, using KL-divergence as a guide.

num_beams: Exploring Paths with Beam Search

The num_beams parameter is integral to a method called beam search, which impacts the quality and diversity of the generated text.

Mechanism: With num_beams set to 1, the model uses a straightforward approach, picking the most likely next word each time. Increasing num_beams lets the model explore multiple potential paths or 'beams' for the next word.
Consequences: More beams mean the model can generate higher quality and varied text, but at the cost of computational resources and time.

do_sample: The Deterministic vs. Stochastic Dilemma

do_sample is a boolean parameter that influences the fundamental approach to text generation.

Mode of Operation: When do_sample is false, the model deterministically picks the most probable next word. When true, it samples from the probability distribution, allowing for a wider range of word choices.
Effects: Setting do_sample to true increases variability and creativity in the model's outputs, whereas false leads to more predictable and consistent text.

Understanding these parameters is important for anyone looking to harness the full potential of AI language models. By playing with these settings, one can strike a balance between creativity and coherence.

What about if you want to create code using AI?

When using AI to generate code or programming-related content, the ideal settings for parameters like top_p, temperature, num_beams, and do_sample you should prioritize accuracy.

A moderate to low top_p value (e.g., 0.2 to 0.4). In coding, accuracy and relevance to the given context are crucial, and a lower top_p helps in maintaining focus and reducing irrelevant or less probable suggestions.

Lower temperature (no more than 0.5) helps in generating more predictable and standard code snippets, as high variability is not typically desired in code generation.

A moderate num_beamsvalue (e.g., 5 to 10) allows the model to explore a reasonable number of alternatives for each token, balancing between quality and computational efficiency. In code generation, considering multiple paths can be beneficial for finding syntactically and semantically correct code.

You typically set do_sample to False for coding tasks.

Remember, these are starting points, and optimal settings might vary based on specific use cases. You need to experiment with these parameters to find the best configuration for your specific needs.