andy pai's tils

Supporting Multi-class Classification with Neuroflow

Today I worked on extending Neuroflow to support multi-class classification, including the addition of softmax activation and cross-entropy loss.

Overview of Softmax

The softmax function is used in multi-class classification models to convert the raw output scores (logits) from the final layer into probabilities. Each class gets a probability score between 0 and 1, and the sum of all scores is 1. The formula for the softmax function for a vector z=[z1,z2,,zn]z = [z_1, z_2, \dots, z_n] is:

softmax(zi)=ezij=1nezj\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{n} e^{z_j}}

This ensures that the output values are normalized and can be interpreted as probabilities.

Intuition Behind Cross Entropy Loss

Cross-entropy loss, also known as log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. The loss increases as the predicted probability diverges from the actual label. For multi-class classification, the cross-entropy loss for a single instance is calculated as:

L=i=1Cyilog(pi)L = -\sum_{i=1}^{C} y_i \log(p_i)

where CC is the number of classes, yiy_i is the true label (one-hot encoded), and pip_i is the predicted probability for class ii.

Illustrative Example

Let's assume a scenario with three classes (0, 1, and 2). If the true label is 1 and the predicted probabilities are [0.1,0.7,0.2][0.1, 0.7, 0.2], the one-hot encoded true label is [0,1,0][0, 1, 0]. The cross-entropy loss for this instance would be:

L=[0log(0.1)+1log(0.7)+0log(0.2)]=log(0.7)L = -[0 \cdot \log(0.1) + 1 \cdot \log(0.7) + 0 \cdot \log(0.2)] = -\log(0.7)

Extending Value to Add Functions

To support these calculations, I needed to extend the Value class to include an exponential (exp()) method, a logarithm (log()) method, and a static softmax function:

class Value {
  constructor(data, _children = [], _op = '') {

  // ...same as before

  exp() {
    const out = new Value(Math.exp(this.data), [this], 'exp')

    const _backward = () => {
      this.grad += out.data * out.grad
    }
    out._backward = _backward

    return out
  }

  log() {
    const out = new Value(Math.log(this.data), [this], 'log')

    const _backward = () => {
      this.grad += (1 / this.data) * out.grad
    }
    out._backward = _backward

    return out
  }

  static softmax(values) {
    const expValues = values.map((val) => val.exp())
    const sumExpValues = expValues.reduce((a, b) => a.add(b), new Value(0))
    const outValues = expValues.map((expVal, i) => {
      const out = expVal.div(sumExpValues)
      const _backward = () => {
        const softmaxVal = out.data
        values.forEach((val, j) => {
          if (i === j) {
            val.grad += softmaxVal * (1 - softmaxVal) * out.grad
          } else {
            val.grad += -softmaxVal * (expValues[j].data / sumExpValues.data) * out.grad
          }
        })
      }
      out._backward = _backward
      return out
    })
    return outValues
  }

  // ...same as before

Implementing Softmax Layer and Cross Entropy Loss

Softmax Layer

A softmax layer applies the softmax function to its inputs. This can be added as part of the forward pass in the Layer class:

  forward(inputs) {
    let outputs = this.neurons.map((neuron) => neuron.forward(inputs))
    if (this.activation === 'softmax') outputs = Value.softmax(outputs)
    return outputs.length === 1 ? outputs[0] : outputs
  }

Cross Entropy Loss

To compute the cross-entropy loss, we first need to one-hot encode the labels. The following help functions can help with the encoding and decoding:

const oneHotEncode = (label, numClasses) => {
  const encoding = Array(numClasses).fill(0)
  encoding[label] = 1
  return encoding
}

const oneHotDecode = (values) => {
  const probs = values.map((v) => v.data)
  return probs.indexOf(Math.max(...probs))
}

Then we can compare the predicted probabilities with the true labels (one-hot encoded) to calculate the cross-entropy loss:

const crossEntropyLoss = (predictions, labels) => {
  const n = predictions.length
  return predictions
    .reduce((acc, pred, i) => {
      const label = labels[i]
      const loss = pred
        .map((p, j) => new Value(-label[j]).mul(p.log()))
        .reduce((a, b) => a.add(b), new Value(0))
      return acc.add(loss)
    }, new Value(0))
    .div(n)
}

With these additions, Neuroflow can support multi-class classification.

Demo

With this engine, I was able to create the following demo for multi-class labels on an (x, y) coordinate plane: