You might not have heard of the term foundation model before, but you’ve almost certainly used one. In reference to LLMs (and AI in general), “foundation model” refers to a model that has been trained on vast swathes of data, such that it can be used in a general context.
LLM foundation models are those like OpenAI’s GPT series (e.g., GPT-3, GPT-4), Meta’s Llama series, and Anthropic’s Claude series of models. The release of a foundation model is a big thing since, as the name implies, they are general-purpose models that a lot of LLM applications are built upon. Thus, improved performance in these foundation models is akin to raising the floor upon which the applications stand.
Moreover, foundation models are expensive to build and train in terms of computing resources, design manpower, and training time. As an example, OpenAI’s GPT-4 was state-of-the-art at the time of its release, boasting an estimated 1.8 trillion parameters. It took an estimated 79 million USD to train and took several weeks to do so, even with the compute power at OpenAI’s disposal.
What differentiates foundation models?
A foundation model’s performance can be measured by testing it on a variety of foundation model benchmarks. These are collections of varied tests in different fields that assess a model’s capabilities. In short, improvements across the board contribute to better foundation models.
An increased parameter count is perhaps the most straightforward characteristic. Accompanying that, improvements to the model’s architecture can increase benchmark accuracy as well as efficiency.
Increased quantity and quality of training data can have a positive effect as well. These days, that includes multimodal data too, since foundation models can analyze images and audio as well.
Improved hardware and utilization of hardware can help decrease inference times, allowing the model to work faster.
Leave a comment