Computer, Enhance: How to Think About LLMs Effectively
The size of the problem-space addressed by LLMs may seem dizzying at first glance; but with some deeper reflection, there's a clear way to think about them that's both simple and powerful. Having a mental framework in place to organise your approach to building next-gen AI apps will provide your designs with improved clarity, give you a coherent methodology for iterating on them, and reduce the time to shipping your product. This mental framework all comes down to the idea of fidelity.
Language and Fidelity
Image downscaling, with zoom to show loss of fidelity
Anyone who's used image-editing software is familiar with scaling: an image can be made larger, smaller, taller, wider - whatever we want. We're also probably aware that most images are made up of a finite number of pixels. Consider that when we make an image smaller, we necessarily decrease the number of pixels in that image. There are various algorithms that are able to decide which pixels to keep and which ones to throw out, but the inescapable fact is that we are reducing the amount of information in that image - its fidelity has decreased. We could do something similar with audio tracks - take an mp3 file, and reduce the bitrate. This would also be a reduction in fidelity. So we can sort of think of fidelity as the amount of detail in something. Another type of information we can represent at different levels of fidelity is language. Compare the text "a recipe for sponge cake" with "beat 200g of sugar in a large mixing bowl with 200g of softened butter. add 4 large eggs and..." (etc. etc.). These are two representations of the same idea: it's just that one of them contains more detail than the other.
The Super-Power of LLMs
Image super-resolution scaling (left) vs classical bi-cubic scaling (right). Super-resolution scaling increases the fidelity of the input - adds information that wasn’t in the original image. Hence the image on the left is sharper.
The most powerful new functionality of AI applications is their ability to increase the fidelity of textual input. Consider summarising a long document - this would, for example, be a transformation which reduces the fidelity of the input with respect to the output. Translating an English sentence into French is a transformation that preserves fidelity. Both of these are very impressive and useful applications of software, but the greatest value can be extracted by LLMs performing tasks like translating the string “a Python function which adds two numbers together” into the actual code which implements that function, or even just answering questions. In these examples, the semantic content of the output is necessarily greater than the semantic content of the input. The information needs to come from somewhere to supplement the input data. For basic LLM prompting, this additional information exists in a latent representation learned from the training dataset.
Introducing Your Own Data - The Hard Way
At times, we may want our app to perform a transformation requiring data not present in the training dataset. The first instinct may be to assume we must introduce additional necessary context into the training data of the underlying model itself. This is referred to as ‘model fine-tuning’ - essentially continuing the model training process with relevant data to tune performance against a given class of documents. Model fine-tuning is, in effect, expanding the amount of latent context accessible to the model. This, in turn, expands the set of information that can be leveraged to perform fidelity transformations. Fine-tuning is definitely a valid way of solving this problem (and it works), but it’s worth being aware of some of the costs involved in doing so. Chiefly, embedding information into model weights requires large quantities of data, as well as sufficient computing resources to train the many billions of weights in a language model. It’s also not even possible for closed-source models given we don’t have access to the underlying implementation. So there are some very real restrictions on our ability to leverage model fine-tuning if we wish to integrate new data into a language-model powered application. There is, however, another way to frame this problem.
Introducing Your Own Data - The Easy Way
A very simplified mental model for RAG
If we augment our prompt with additional information, then we can reframe the problem (from the LLM’s perspective) as a decrease in fidelity - the model will have no need to rely on latent context to “fill-in-the-blanks”, it will just be a task to summarise the information provided. The problem, then, becomes ‘how do we know what information is relevant to the task at hand?’. Luckily, this reduces to an information retrieval problem - and information retrieval is a very well-studied field in computer science. We can treat the initial user prompt as a search query against an index of supplementary data, and include the search results in the prompt we send to the model. Information retrieval is good enough these days to be almost considered a ‘solved problem’, so we see good results with this approach - particularly if we use best-in-class algorithms such as document vectorization. This flow, in which we augment our prompt with additional context at query-time, is referred to as Resource Augmented Generation or RAG. RAG is a much simpler method of exposing private data to next-gen AI apps, but comes with its own limitations. Namely, there are limits to the volume of information we can include in a prompt - if the task at hand requires context that is more structural than factual, we may find that RAG pipelines struggle to include sufficient information in the model prompt.
The power of large language models is undeniable, and designing a next-gen AI app can seem intimidating. But thinking about them in the right way can remove these barriers, providing you with the tools you need to ship your product quickly and robustly.