As OpenAI models have progressed over the years, you might have heard of the new models being released via headlines or in passing. But beyond the base GPT versions, the naming conventions probably seem rather confusing. 4o, .5, turbo? What does it even mean? Let’s take a look, starting with the basics.

Base GPT – major versions

The major versions of the base GPT models – that is, GPT-2, GPT-3, and GPT-4, are named as such since each version represents a major leap in capabilities. To get it out of the way, “GPT” stands for generative pre-trained transformer, in other words, OpenAI’s transformer-based LLM models.

To put the leaps in progress into perspective, let’s look at the models’ parameter counts.

GPT-2: 1.5 billion

GPT-3: 175 billion

GPT-4: estimated 1.8 trillion

Other features of the models such as the context window length, multimodality, and training data quantity and quality similarly improved as well.

“o”-models

It gets a little confusing here. When the “o” comes after the number, like in GPT-4o, it stands for “omni”, signifying the model’s capability to handle multimodal input and output – text, vision, and audio.

When the “o” comes before the number, like in the o1 model for example, it denotes a class of models that specialize in advanced reasoning, such as in math, science, and programming.

mini

These models, as the name suggests, are distilled versions of their “full” counterparts. They have fewer parameters and thus sacrifice some accuracy and reasoning ability. However, they, in turn, are faster to run and cheaper to use, ideal for use cases where deep reasoning doesn’t matter as much, like in chatbots and some other RAG applications.

Posted in

Leave a comment