How Temperature Controls AI Creativity

2025-05-09

Today I came across a great visual explanation tweet by Akshay Pachaar that perfectly illustrates how temperature affects a language model's output. Temperature controls how random or creative the model's responses are by shaping the probability distribution over possible next tokens. A low temperature makes the model more deterministic and focused, while a high temperature encourages more diverse, surprising outputs.

Technically, temperature modifies the model's raw output scores (logits) before they're passed through the softmax function to become probabilities. This simple adjustment lets you dial the model's behavior from predictable to inventive with a single parameter.

The Mechanics: Logits, Softmax, and Temperature

When an LLM predicts the next word, it generates a logit score for every possible token in its vocabulary. Higher logits mean a higher likelihood. The softmax function then converts these logits into a probability distribution.

Temperature is applied by dividing the logits by the temperature value before the softmax calculation.

probabilities = softmax(logits / temperature)

Let's see this in action with a simple PyTorch example. Imagine we have four logits for four possible next tokens.

import torch
import torch.nn.functional as F

# Raw logits from the model
logits = torch.tensor([1, 2, 3, 4], dtype=torch.float32)

Temperature = 1.0 (The Default)

With a temperature of 1.0, the calculation is just a standard softmax. You can calculate it manually or use PyTorch's built-in function.

# Manual softmax calculation
temp_1 = 1.0
manual_probs_1 = torch.exp(logits / temp_1) / torch.sum(torch.exp(logits / temp_1))
# tensor([0.0321, 0.0871, 0.2369, 0.6439])

# Using PyTorch's softmax function
probs_1 = F.softmax(logits / temp_1, dim=0)
# tensor([0.0321, 0.0871, 0.2369, 0.6439])

The token with logit 4 has a ~64% chance of being selected.

Temperature < 1.0 (Less Random, More Confident)

When the temperature is lowered (e.g., to 0.5), the differences between logits are amplified. This makes the model more confident and deterministic, heavily favoring the token with the highest logit.

# Low temperature (less random)
temp_0_5 = 0.5
manual_probs_0_5 = torch.exp(logits / temp_0_5) / torch.sum(torch.exp(logits / temp_0_5))
# tensor([0.0021, 0.0158, 0.1171, 0.8650])

# Using PyTorch's softmax function
probs_0_5 = F.softmax(logits / temp_0_5, dim=0)
# tensor([0.0021, 0.0158, 0.1171, 0.8650])

Now, the probability of the highest-scoring token has jumped to ~87%. The model becomes more "peaky" and less likely to choose surprising words.

Temperature > 1.0 (More Random, More Creative)

Conversely, a higher temperature flattens the probability distribution, making the model's choices more random and creative. Even lower-scoring tokens get a better chance of being selected.

# High temperature (more random)
temp_2 = 2.0
manual_probs_2 = torch.exp(logits / temp_2) / torch.sum(torch.exp(logits / temp_2))
# tensor([0.1015, 0.1674, 0.2760, 0.4551])

# Using PyTorch's softmax function
probs_2 = F.softmax(logits / temp_2, dim=0)
# tensor([0.1015, 0.1674, 0.2760, 0.4551])

The probability distribution is now much flatter. The top token's probability has dropped to ~46%, and the others are closer in likelihood.

Key takeaways:

Temperature controls randomness: It's a knob for tuning the creativity vs. predictability of an LLM.
Low temperature (< 1.0): Makes the model more deterministic and focused. Good for factual tasks like Q&A or code generation.
High temperature (> 1.0): Encourages creativity and diversity. Ideal for brainstorming, story writing, or other creative tasks.
It's all in the math: Temperature works by scaling the logits before the softmax function, which directly reshapes the final probability distribution.