Supporting Multi-class Classification with Neuroflow
Overview of Softmax
Intuition Behind Cross Entropy Loss
Illustrative Example
Extending Value to Add Functions
Implementing Softmax Layer and Cross Entropy Loss
Softmax Layer
Cross Entropy Loss
Demo

Supporting Multi-class Classification with Neuroflow

Today I worked on extending Neuroflow to support multi-class classification, including the addition of softmax activation and cross-entropy loss.

Overview of Softmax

The softmax function is used in multi-class classification models to convert the raw output scores (logits) from the final layer into probabilities. Each class gets a probability score between 0 and 1, and the sum of all scores is 1. The formula for the softmax function for a vector $z = [z_1, z_2, \dots, z_n]$ is:

\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{n} e^{z_j}}

This ensures that the output values are normalized and can be interpreted as probabilities.

Intuition Behind Cross Entropy Loss

Cross-entropy loss, also known as log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. The loss increases as the predicted probability diverges from the actual label. For multi-class classification, the cross-entropy loss for a single instance is calculated as:

L = -\sum_{i=1}^{C} y_i \log(p_i)

where $C$ is the number of classes, $y_i$ is the true label (one-hot encoded), and $p_i$ is the predicted probability for class $i$ .

Illustrative Example

Let's assume a scenario with three classes (0, 1, and 2). If the true label is 1 and the predicted probabilities are $[0.1, 0.7, 0.2]$ , the one-hot encoded true label is $[0, 1, 0]$ . The cross-entropy loss for this instance would be:

L = -[0 \cdot \log(0.1) + 1 \cdot \log(0.7) + 0 \cdot \log(0.2)] = -\log(0.7)

Extending `Value` to Add Functions

To support these calculations, I needed to extend the Value class to include an exponential (exp()) method, a logarithm (log()) method, and a static softmax function:

class Value {
  constructor(data, _children = [], _op = '') {

  // ...same as before

  exp() {
    const out = new Value(Math.exp(this.data), [this], 'exp')

    const _backward = () => {
      this.grad += out.data * out.grad
    }
    out._backward = _backward

    return out
  }

  log() {
    const out = new Value(Math.log(this.data), [this], 'log')

    const _backward = () => {
      this.grad += (1 / this.data) * out.grad
    }
    out._backward = _backward

    return out
  }

  static softmax(values) {
    const expValues = values.map((val) => val.exp())
    const sumExpValues = expValues.reduce((a, b) => a.add(b), new Value(0))
    const outValues = expValues.map((expVal, i) => {
      const out = expVal.div(sumExpValues)
      const _backward = () => {
        const softmaxVal = out.data
        values.forEach((val, j) => {
          if (i === j) {
            val.grad += softmaxVal * (1 - softmaxVal) * out.grad
          } else {
            val.grad += -softmaxVal * (expValues[j].data / sumExpValues.data) * out.grad
          }
        })
      }
      out._backward = _backward
      return out
    })
    return outValues
  }

  // ...same as before

Implementing Softmax Layer and Cross Entropy Loss

Softmax Layer

A softmax layer applies the softmax function to its inputs. This can be added as part of the forward pass in the Layer class:

  forward(inputs) {
    let outputs = this.neurons.map((neuron) => neuron.forward(inputs))
    if (this.activation === 'softmax') outputs = Value.softmax(outputs)
    return outputs.length === 1 ? outputs[0] : outputs
  }

Cross Entropy Loss

To compute the cross-entropy loss, we first need to one-hot encode the labels. The following help functions can help with the encoding and decoding:

const oneHotEncode = (label, numClasses) => {
  const encoding = Array(numClasses).fill(0)
  encoding[label] = 1
  return encoding
}

const oneHotDecode = (values) => {
  const probs = values.map((v) => v.data)
  return probs.indexOf(Math.max(...probs))
}

Then we can compare the predicted probabilities with the true labels (one-hot encoded) to calculate the cross-entropy loss:

const crossEntropyLoss = (predictions, labels) => {
  const n = predictions.length
  return predictions
    .reduce((acc, pred, i) => {
      const label = labels[i]
      const loss = pred
        .map((p, j) => new Value(-label[j]).mul(p.log()))
        .reduce((a, b) => a.add(b), new Value(0))
      return acc.add(loss)
    }, new Value(0))
    .div(n)
}

With these additions, Neuroflow can support multi-class classification.

Demo

With this engine, I was able to create the following demo for multi-class labels on an (x, y) coordinate plane:

2024-05-26