- Supporting Multi-class Classification with Neuroflow
- Overview of Softmax
- Intuition Behind Cross Entropy Loss
- Illustrative Example
- Extending
`Value`

to Add Functions - Implementing Softmax Layer and Cross Entropy Loss
- Softmax Layer
- Cross Entropy Loss
- Demo

Today I worked on extending Neuroflow to support multi-class classification, including the addition of softmax activation and cross-entropy loss.

The softmax function is used in multi-class classification models to convert the raw output scores (logits) from the final layer into probabilities. Each class gets a probability score between 0 and 1, and the sum of all scores is 1. The formula for the softmax function for a vector $z = [z_1, z_2, \dots, z_n]$ is:

$\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{n} e^{z_j}}$

This ensures that the output values are normalized and can be interpreted as probabilities.

Cross-entropy loss, also known as log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. The loss increases as the predicted probability diverges from the actual label. For multi-class classification, the cross-entropy loss for a single instance is calculated as:

$L = -\sum_{i=1}^{C} y_i \log(p_i)$

where $C$ is the number of classes, $y_i$ is the true label (one-hot encoded), and $p_i$ is the predicted probability for class $i$.

Let's assume a scenario with three classes (0, 1, and 2). If the true label is 1 and the predicted probabilities are $[0.1, 0.7, 0.2]$, the one-hot encoded true label is $[0, 1, 0]$. The cross-entropy loss for this instance would be:

$L = -[0 \cdot \log(0.1) + 1 \cdot \log(0.7) + 0 \cdot \log(0.2)] = -\log(0.7)$

`Value`

to Add FunctionsTo support these calculations, I needed to extend the `Value`

class to include an exponential (`exp()`

) method, a logarithm (`log()`

) method, and a static softmax function:

```
class Value {
constructor(data, _children = [], _op = '') {
// ...same as before
exp() {
const out = new Value(Math.exp(this.data), [this], 'exp')
const _backward = () => {
this.grad += out.data * out.grad
}
out._backward = _backward
return out
}
log() {
const out = new Value(Math.log(this.data), [this], 'log')
const _backward = () => {
this.grad += (1 / this.data) * out.grad
}
out._backward = _backward
return out
}
static softmax(values) {
const expValues = values.map((val) => val.exp())
const sumExpValues = expValues.reduce((a, b) => a.add(b), new Value(0))
const outValues = expValues.map((expVal, i) => {
const out = expVal.div(sumExpValues)
const _backward = () => {
const softmaxVal = out.data
values.forEach((val, j) => {
if (i === j) {
val.grad += softmaxVal * (1 - softmaxVal) * out.grad
} else {
val.grad += -softmaxVal * (expValues[j].data / sumExpValues.data) * out.grad
}
})
}
out._backward = _backward
return out
})
return outValues
}
// ...same as before
```

A softmax layer applies the softmax function to its inputs. This can be added as part of the forward pass in the `Layer`

class:

```
forward(inputs) {
let outputs = this.neurons.map((neuron) => neuron.forward(inputs))
if (this.activation === 'softmax') outputs = Value.softmax(outputs)
return outputs.length === 1 ? outputs[0] : outputs
}
```

To compute the cross-entropy loss, we first need to one-hot encode the labels. The following help functions can help with the encoding and decoding:

```
const oneHotEncode = (label, numClasses) => {
const encoding = Array(numClasses).fill(0)
encoding[label] = 1
return encoding
}
const oneHotDecode = (values) => {
const probs = values.map((v) => v.data)
return probs.indexOf(Math.max(...probs))
}
```

Then we can compare the predicted probabilities with the true labels (one-hot encoded) to calculate the cross-entropy loss:

```
const crossEntropyLoss = (predictions, labels) => {
const n = predictions.length
return predictions
.reduce((acc, pred, i) => {
const label = labels[i]
const loss = pred
.map((p, j) => new Value(-label[j]).mul(p.log()))
.reduce((a, b) => a.add(b), new Value(0))
return acc.add(loss)
}, new Value(0))
.div(n)
}
```

With these additions, Neuroflow can support multi-class classification.

With this engine, I was able to create the following demo for multi-class labels on an (x, y) coordinate plane: