Why logits.exp() Equals Counts
Today I learned an intuitive explanation about why neural network computations, especially weighted sums, can be viewed as operations in the log-domain—turning multiplicative interactions into additive ones through logarithms.
Have Andrej Karpathy's makemore code to thank for this insight.
My notes on why logits.exp() yields what can be seen as "counts".
1. Neurons compute weighted sums
Typically, a neuron calculates a weighted sum like:
Here, are the inputs, and are the neuron's weights.
2. Logarithms convert products into sums
Consider you have counts (frequencies, probabilities) . Their product becomes additive when using logarithms:
3. Weighted sums as logs of powered products
If each count is raised to the power of its weight , the equation looks like this:
Taking the logarithm again turns this into a sum:
4. Mapping back to neurons
Now, if we define each neuron's input as , the neuron's weighted sum exactly equals:
Exponentiating this sum (logits.exp()) recovers the original product of counts raised to their weights:
This explains the annotation in Karpathy's makemore bigram code:
xenc = F.one_hot(xs, num_classes=27).float()  # input: one-hot encoding
logits = xenc @ W                             # predicted log-counts
counts = logits.exp()                         # counts (N)
probs = counts / counts.sum(1, keepdims=True) # probabilities (softmax)Exponentiating logits (which represent log-counts) transforms them back into actual counts.
Why this matters
This perspective isn't just mathematical sleight-of-hand—it clarifies why neural networks are effective at modeling multiplicative relationships and log-likelihoods. When we use logarithms, complex multiplicative interactions naturally simplify into additive operations that neurons easily handle.
Takeaways:
- Using logarithms converts multiplicative interactions into additive ones, simplifying computations.
- logits.exp()yields actual counts because the original computations were implicitly performed using logarithms.
- Understanding neural computations this way makes deeper neural network behaviors intuitive, especially for probability modeling.