• Sometimes you may feel like you can’t get an LLM chatbot to do quite what you want. Sometimes, this can be resolved by improving your prompt. Let’s take a look at prompt engineering, the art of constructing an effective prompt!

    Prompts and prompt engineering

    A prompt is simply the textual input you give generative AI models (like LLMs and image generation models), instructing them to perform your desired task. A simple example – whatever you type into ChatGPT is a prompt!

    Prompt engineering is the art of designing prompts to be more relevant, effective, and imposing some manner of constraints. It’s generally done all with natural language, and you can pick and choose what methods you’d like to use, so don’t be intimidated!

    Some prompt engineering techniques

    Here are some commonly-used prompt engineering techniques

    One/Few-shot prompting

    In this technique, “shots” refers to examples you’re providing in your prompt. A regular prompt you would use, like “Make a sentence with the word ‘weta’ in it”, is an example of a zero-shot prompt, since you’re not providing any examples. (PS: Google it if you dare)

    An example of a one-shot prompt would be:

    “A weta is an insect endemic to New Zealand. An example of a sentence using the word ‘weta’ is:

    Mark was terrified of the weta crawling on his bedroom floor

    Create a sentence using the word ‘weta’ in it”

    As for few-shot prompting, just add a few more examples.

    Role prompting

    In this technique, you assign the LLM a particular role or persona. This encourages the model to embody a similar expertise level, tone, and perspective as the role you assigned. An example:

    “You are a technical support specialist with expertise in handling network issues. You are also great at explaining solutions to non-technical teammates.

    I’m facing this issue, please help me resolve it: … “

    Chain of thought

    In this technique, you encourage the model to work through its solution step-by-step. This encourages the model to generate its ‘thinking process’, which can be useful as added context for complex questions. An example:

    “Find the solutions to this equation, and explain your reasoning step-by-step:

    x^2 + 2x + 1 = 0”

  • When we previously discussed embeddings, we talked about being able to imbue an embedding with relevant semantic meaning from the surrounding context. For example, in the sentence “I spoke to Mark but he …”, an LLM would like to know what the embedding “he” refers to. The method that makes this possible is called attention. Let’s take a high-level overview of what it’s about.

    How attention works (at a high level)

    At the core of each transformer block within a transformer (the revolutionary architecture that made modern LLMs possible) lies the attention layer. An attention head is a component that conducts the attention mechanism, and several of them run in parallel within an attention layer. Essentially, an attention head assigns relevancy scores with respect to the current token for preceding token embeddings within the context window. In lingo, we say that the current token is attending to other tokens.

    These relevancy scores are then combined into the current token embedding to enrich it with relevant semantic meaning from the surrounding text. This method of attention is called self-attention and lies at the heart of GPT models. I’ll dive deeper into how attention works in a later post.

  • When you read any kind of text, you’re able to quite naturally understand what’s written, without giving it much active thought. Take a look at someone learning a new language, however, and you’ll see that when they try to read a sentence, they do so by breaking it down – usually word-by-word, and sometimes breaking down larger words further.

    Similarly, LLMs break down their text inputs into smaller parseable units called tokens. Your first thought might be to break down texts into individual words, and that’s valid! Termed “word tokenization”, that’s a well-known tokenization strategy. However, consider the words “running”, “runner”, and “runners”. When you think about these words, you probably don’t consider them separately. You identify the root of the word – “run”, and that it’s conjoined with suffixes that slightly modify the word’s context.

    Likewise, subword tokenization is a dominant tokenization method for LLMs. As the name suggests, tokens obtained via this method can be smaller than an entire word, often word roots, prefixes, and suffixes as described above.

    How tokenization plays into LLMs

    LLMs are designed with a certain vocabulary size in mind. This determines the number of tokens it can register in its vocabulary. Before an LLM reads data, an algorithm called a tokenizer breaks the word down into tokens. The tokenizer is trained to generate a token vocabulary of a specified size that fits the data it expects to read well. In case the LLM encounters unexpected text that might not fit word or even subword tokens (such as misspelled words), byte tokens are often added to the vocabulary as well, so that the data is still able to be tokenized. These tokens represent a single byte of data – quite the granular division!

    Since an LLM’s vocabulary is fixed after the tokenizer is trained, each token is assigned a unique numerical token ID. When text is broken down into tokens, those tokens are represented by their IDs. Next, LLMs maintain an embedding matrix that maps each token ID to its corresponding token embedding – an embedding that solely represents that token. This way, tokens can be quickly converted into token embeddings for the LLM to use.

  • If you’re unfamiliar with how Large Language Models (LLMs) – such as those behind chatbots like ChatGPT – work, you may wonder how they can seemingly understand what you’re saying. More impressively, even if you don’t convey your intentions very well, they can often pick up on what you want to say!

    Behind these feats lie embeddings, a method by which LLMs can capture the meaning behind text.

    Embeddings?

    An embedding is just a vector – an ordered list of numbers. Each embedding is quite large, on the scale of hundreds and even thousands of numbers. Each embedding vector is able to store a specific semantic meaning – in other words, what the text actually represents. For instance, the word “log” can have multiple meanings depending on the context – a fallen tree, a record of events, or the mathematical function. If we use an embedding to store the meaning of “log”, it will have different values depending on the surrounding context (I’ll explain how identifying the surrounding context works later).

    How do embeddings work?

    For now, let’s imagine that an embedding captures the meaning of a single word.

    Since each embedding is a vector of say – length ‘N’ – we could say that it represents a point in N-dimensional space. As an analogy, a vector containing 2 numbers could identify a point on a piece of paper, and a vector containing 3 numbers could pinpoint a location in 3-D space.

    Embeddings group words with similar semantic meaning close to each other, and the more dissimilar the words are, the farther away from each other they get. Remarkably, embeddings have been shown to capture the meanings of more abstract concepts via specific directions. Here’s a famous example. In the embedding model Word2Vec, the vectors for “king” – “man” roughly equal the result of “queen” – “woman”. This shows that the idea of “monarchy” is captured via a particular direction in N-dimensional space!

    How are embeddings made?

    It may still feel eerily magical that such a system can be created for something as complex as human language. Embeddings are made via an embedding model/layer, generally a specialized neural network that’s been trained to produce embeddings that accurately capture semantic meaning. This is often done via contrastive learning, a method that teaches the model the similarity and dissimilarity between words.

    More on embeddings

    While we’ve been discussing embeddings in the context of individual words, they can capture the meaning of sentences, paragraphs, and even whole documents! This ties into the idea of capturing the meaning of a word while taking into account the surrounding context. This concept lies at the heart of LLMs, done via a mechanism called attention. Attention allows an LLM to imbue the meaning of relevant portions of text into an embedding, allowing it to capture richer semantic meaning. It’s a rather complex topic, so I won’t get into it in this post.