Category: Uncategorized

  • What Are LLM Hallucinations? When it comes to LLMs, “hallucinations” refer to instances where the model generates information that is inaccurate, irrelevant, or entirely fabricated. The term is metaphorical, borrowing from human experiences of perceiving things that aren’t real, to describe how an AI model can produce outputs that appear plausible and convincing but are…

  • Common Mistakes in Evaluation One frequent error is over-reliance on a single metric, which fails to capture the multidimensional nature of language tasks. Using outdated benchmarks can misrepresent modern model abilities or ignore emerging challenges. Data leakage—where test data overlaps with training data—can artificially inflate scores and mislead evaluations. Best Practices Combining human and automatic…

  • Benchmarks Benchmarks provide standardized datasets and tasks to compare model performance. Popular benchmarks for LLMs include GLUE and SuperGLUE for language understanding, SQuAD for question answering, as well as more specialized domain tests focused on coding ability or multilingual competence. These benchmarks help track progress and identify gaps across diverse challenges. Core Automatic Metrics Common…

  • Evaluating large language models (LLMs) is crucial to ensure they deliver accurate, safe, and useful outputs. After all, without rigorous assessment, models may generate incorrect, biased, or harmful content that undermines trust and viability. Evaluation helps developers understand a model’s strengths and weaknesses, guide improvements, and ensure alignment with practical needs. How do we evaluate…

  • What is Temperature in LLMs? When it comes to LLMs, temperature is a key parameter that controls the randomness or creativity of the text the model generates. It acts like a dial that influences how adventurous or predictable the model’s word choices are when producing language, essentially shaping the style and variety of the output.…

  • What is a Context Window? A context window is essentially the span or range of tokens—units of text like words, subwords, or punctuation—that an AI language model can consider or “remember” at one time. Think of it as the model’s working memory, the active portion of text it analyzes when processing input and generating outputs.…

  • We’ve talked about how transformers generate predictions, but there’s a crucial step at the end of the process that often gets glossed over: the softmax function. This mathematical function is what lets a model turn raw scores into something meaningful – probabilities. Let’s break down what softmax is, why it’s important, and how it fits…

  • Continuing from the previous post, let’s now dive into the second half of the transformer – the decoder. If you recall, the decoder takes the contextualized embeddings produced by the encoder from the input sequence. It then generates the output sequence, one token at a time. Let’s break down how this process unfolds, step by…

  • Continuing from the previous post, let’s dive into the first section of the transformer – **the encoder**. As we discussed, the encoder embeds the input tokens, uses positional encoding and attention to imbue the token embeddings with relevant meaning, and passes the modified embeddings to the decoder. We’ve already covered how token embeddings work, so…

  • Introduced in the seminal paper “Attention Is All You Need,” the transformer revolutionized the world of natural language processing (NLP) and supercharged the progress of LLMs today. Let’s take a look at how it works. Let’s consider transformers used for causal language modelling. Causal refers to the property of depending only on prior and current…