What are embeddings?

If you’re unfamiliar with how Large Language Models (LLMs) – such as those behind chatbots like ChatGPT – work, you may wonder how they can seemingly understand what you’re saying. More impressively, even if you don’t convey your intentions very well, they can often pick up on what you want to say!

Behind these feats lie embeddings, a method by which LLMs can capture the meaning behind text.

Embeddings?

An embedding is just a vector – an ordered list of numbers. Each embedding is quite large, on the scale of hundreds and even thousands of numbers. Each embedding vector is able to store a specific semantic meaning – in other words, what the text actually represents. For instance, the word “log” can have multiple meanings depending on the context – a fallen tree, a record of events, or the mathematical function. If we use an embedding to store the meaning of “log”, it will have different values depending on the surrounding context (I’ll explain how identifying the surrounding context works later).

How do embeddings work?

For now, let’s imagine that an embedding captures the meaning of a single word.

Since each embedding is a vector of say – length ‘N’ – we could say that it represents a point in N-dimensional space. As an analogy, a vector containing 2 numbers could identify a point on a piece of paper, and a vector containing 3 numbers could pinpoint a location in 3-D space.

Embeddings group words with similar semantic meaning close to each other, and the more dissimilar the words are, the farther away from each other they get. Remarkably, embeddings have been shown to capture the meanings of more abstract concepts via specific directions. Here’s a famous example. In the embedding model Word2Vec, the vectors for “king” – “man” roughly equal the result of “queen” – “woman”. This shows that the idea of “monarchy” is captured via a particular direction in N-dimensional space!

How are embeddings made?

It may still feel eerily magical that such a system can be created for something as complex as human language. Embeddings are made via an embedding model/layer, generally a specialized neural network that’s been trained to produce embeddings that accurately capture semantic meaning. This is often done via contrastive learning, a method that teaches the model the similarity and dissimilarity between words.

More on embeddings

While we’ve been discussing embeddings in the context of individual words, they can capture the meaning of sentences, paragraphs, and even whole documents! This ties into the idea of capturing the meaning of a word while taking into account the surrounding context. This concept lies at the heart of LLMs, done via a mechanism called attention. Attention allows an LLM to imbue the meaning of relevant portions of text into an embedding, allowing it to capture richer semantic meaning. It’s a rather complex topic, so I won’t get into it in this post.

Aditya Venkatesh's Personal Website

recent posts

about