RAG, or retrieval-augmented generation, is a technique that allows LLMs to access external sources of data. Normally, LLMs can only rely on the prompt fed to them, and the knowledge baked into their parameters. However, RAG vastly expands LLMs’ capabilities. Using these external sources of data, an LLM can incorporate it into its output. This proves to be a very versatile technique, allowing an LLM to incorporate web search results, private documents and more.
How does RAG work?
Fundamentally, RAG is rather straightforward. Once the system receives a prompt, we need to use that prompt to find relevant data from whatever source our RAG system is using. Once we have that, we narrow it down to the most relevant results, then allow the LLM to read that data along with the input prompt. I’ll go in-depth into how RAG works in a later post.
Why use RAG?
Think of RAG like handing an LLM a textbook, or giving it access to Google. By backing the LLM up with a data source, the LLM can generate answers based on up-to-date data, as well as data from private/specialized sources. All this without retraining the model! Additionally, RAG reduces hallucinations, since the LLM has a repository of data to reference and rely on.
Where is RAG used?
Aside from search-backed LLM applications like Perplexity, RAG is used in areas that require the LLM to use private or specialized data.
For instance, a customer support chatbot could utilize RAG to reference private company policy documents, and thus generate answers that accurately and specifically apply to that company.
Similarly, RAG can be used in internal search tools, such as within companies and legal firms to semantically sift through the large swathes of private data that may exist. The sky is really the limit when it comes to RAG, think of any use case that an LLM foundation model might not have sufficient knowledge in, and you can probably use RAG to supercharge a foundation model for your use case!
Leave a comment