LLMs (large language models) are truly remarkable with their language understanding, comprehension, and creative writing skills. They are revolutionizing the way question-and-answer systems are built. During the training phase, you can consider that LLMs are compressing the world knowledge and storing it in the form of parameters, but also learn the language in the process. Although they captured the world knowledge in a compressed manner, this can be considered very lossy compression. When creating question-and-answer systems, it is important to note that due to this lossy compression during training, retrieving information may pose a challenge. However, LLMs’ language understanding is pivotal in accurately interpreting user’s question and producing text that is both syntactically and semantically correct.

qna-intro

When asking Language Model Machines (LLMs) to generate a response, there are a few potential problems one may face. Firstly, if the knowledge required to answer the question has been lost during the compression phase, the LLM might not be able to provide an accurate answer. Secondly, if the topic of the question was not covered during the LLM’s training, or if the public knowledge on the subject has changed since training, the LLM may not have the necessary information to produce a satisfactory response.

Addressing Data Unavailability and Lossy Compression

To address the issue of data unavailable, we can retrain the LLM on our private data. It’s important to acknowledge that training a language model comes with significant costs. However, as our data evolves, it becomes necessary to continually retrain the model. Since language models’ language understanding does not pose any challenge, there is no need for full training of a language model; instead, we can fine-tune the model to adapt to our custom data. Fine-tuning costs aren’t significant, so we can afford to fine-tune as often as required to make LLMs keep up with the changes, but we cannot address the accuracy of the retrieved information due to lossy compression challenge that occurred during training or fine-tuning.

rag-intro

As you may recall, LLMs are not only experts in language but also in knowledge. Since we can’t always rely on their knowledge, we can make use of their expertise in language. To illustrate this, let’s consider the task of reading comprehension. Having a strong grasp of the language enables someone to understand the given text and respond to questions in a grammatically correct and syntactically sound manner. Similarly, we can transform the problem of building a question-and-answer system into a reading comprehension system where LLMs are asked to provide answers using the context provided. This is essentially how a retrieval-augmented-generation system works; it leverages LLMs’ ability to comprehend provided text. Language models have large context windows that allow for embedding relevant text which LLMs can utilize to answer users’ questions.

Retrieval Augmented Generation is very analogous to reading comprehension.

What does comprehending by a LLM mean?

LLMs can comprehend written words, decode their meaning, and establish connections between sentences using attention just like how humans can do the task of reading comprehension. The concept of envisioning a question-and-answer system as a reading comprehension system forms the basis for how RAG systems operate. When you are given a reading comprehension task, you may come across a text that you are already familiar with. However, you might be instructed to only use the information provided in the context and not rely on any outside knowledge. In addition, while answering any questions related to the text, similarly you can instruct an LLM (Language Model) to limit the responses to the given context only.

This transformation changes the nature of the Natural Language Processing (NLP) task of question and answering into an efficient and effective search problem. There are two goals: first, to find relevant paragraphs within your private or public knowledge base, and second, to ensure that LLM (language model) confirms the context by tuning the instructions.

I will explain each of these two goals in a multi-part series.