We’ve talked about how transformers generate predictions, but there’s a crucial step at the end of the process that often gets glossed over: the softmax function. This mathematical function is what lets a model turn raw scores into something meaningful – probabilities. Let’s break down what softmax is, why it’s important, and how it fits into the bigger picture.

What is Softmax?

After a transformer has just finished processing an input. It spits out a vector of numbers, one for each possible word in its vocabulary. These numbers (often called logits) are not probabilities yet – they’re just raw, unbounded scores. We need a way to turn these scores into probabilities that sum to 1, so the model can “decide” what word to pick next.

The softmax function takes this vector of scores and squashes them into a probability distribution. The higher the score, the higher the resulting probability – but crucially, all probabilities will add up to 1.

Moreover, the softmax function uses exponentials. This makes it so that if any scores are significantly higher than the rest, that disparity gets magnified, and the probability of that token being selected is high.

How does softmax work?

The softmax function is quite straightforward! This is how it works:

1) For each score in the input vector to the function, exponentiate it. In other words, if the score is x, we raise e (the mathematical constant) to the power of x.

2) For each of these exponentiated values, we divide it by the sum of all exponentiated values to obtain the probability for the corresponding score/token!

The second step ensures that all probabilities sum up to one, and the function also ensures that all probabilities are between 0 and 1.

Where is softmax used?

Aside from the end of the decoder step in transformers, they’re used in various other applications of AI, such as classification and reinforcement learning. The act of converting a list of arbitrary scores into a list of probabilities is what makes the softmax function so useful and ubiquitous!

Posted in

Leave a comment