Today I learned an intuitive explanation about why neural network computations, especially weighted sums, can be viewed as operations in the log-domain—turning multiplicative interactions into additive ones through logarithms.

Have Andrej Karpathy's makemore code to thank for this insight.

My notes on why logits.exp() yields what can be seen as "counts".

1. Neurons compute weighted sums

Typically, a neuron calculates a weighted sum like:

z=w1x1+w2x2++wnxn+bz = w_1 x_1 + w_2 x_2 + \dots + w_n x_n + b

Here, xix_i are the inputs, and wiw_i are the neuron's weights.

2. Logarithms convert products into sums

Consider you have counts (frequencies, probabilities) c1,c2,,cnc_1, c_2, \dots, c_n. Their product becomes additive when using logarithms:

log(c1c2cn)=log(c1)+log(c2)++log(cn)\log(c_1 \cdot c_2 \cdot \dots \cdot c_n) = \log(c_1) + \log(c_2) + \dots + \log(c_n)

3. Weighted sums as logs of powered products

If each count cic_i is raised to the power of its weight wiw_i, the equation looks like this:

c1w1c2w2cnwnc_1^{w_1} \cdot c_2^{w_2} \cdot \dots \cdot c_n^{w_n}

Taking the logarithm again turns this into a sum:

log(c1w1)+log(c2w2)++log(cnwn)=w1log(c1)+w2log(c2)++wnlog(cn)\log(c_1^{w_1}) + \log(c_2^{w_2}) + \dots + \log(c_n^{w_n}) = w_1 \log(c_1) + w_2 \log(c_2) + \dots + w_n \log(c_n)

4. Mapping back to neurons

Now, if we define each neuron's input xix_i as log(ci)\log(c_i), the neuron's weighted sum exactly equals:

w1x1+w2x2++wnxn=w1log(c1)+w2log(c2)++wnlog(cn)w_1 x_1 + w_2 x_2 + \dots + w_n x_n = w_1 \log(c_1) + w_2 \log(c_2) + \dots + w_n \log(c_n)

Exponentiating this sum (logits.exp()) recovers the original product of counts raised to their weights:

exp(z)=c1w1c2w2cnwn\exp(z) = c_1^{w_1} \cdot c_2^{w_2} \cdot \dots \cdot c_n^{w_n}

This explains the annotation in Karpathy's makemore bigram code:

xenc = F.one_hot(xs, num_classes=27).float()  # input: one-hot encoding
logits = xenc @ W                             # predicted log-counts
counts = logits.exp()                         # counts (N)
probs = counts / counts.sum(1, keepdims=True) # probabilities (softmax)

Exponentiating logits (which represent log-counts) transforms them back into actual counts.

Why this matters

This perspective isn't just mathematical sleight-of-hand—it clarifies why neural networks are effective at modeling multiplicative relationships and log-likelihoods. When we use logarithms, complex multiplicative interactions naturally simplify into additive operations that neurons easily handle.

Takeaways:

  • Using logarithms converts multiplicative interactions into additive ones, simplifying computations.
  • logits.exp() yields actual counts because the original computations were implicitly performed using logarithms.
  • Understanding neural computations this way makes deeper neural network behaviors intuitive, especially for probability modeling.