🦣 Fine-tuning Large Language Models

Next — Today I Learnt About Data Science | Issue #84

Aug 23, 2023

Hi there!

I wrote a blog on my internship experience at HP! ✨

HP Internship: A Year and a Half in the Fast Lane

Today, I am going to talk about how you can create a ChatGPT on your custom data. ChatGPT, when it was launched around nine months ago, took the world by storm. You could ask a question, and it would generate a response. Its responses were surprisingly accurate but not always true.

ChatGPT and its class of generative AI Large Language Models (LLMs) generate next word in a sequence, given previous prompts. Under the hood, they are based on transformers and attention mechanisms built on top of deep learning models. Some researchers at University of Tennessee, including me, wrote an article explaining how LLMs work.

In today’s letter, I will write about three distinct ways to train your Large Language Models (LLMs). Popularly, this is called “fine-tuning”. The term is not technically correct as only the first of the three approaches that I’m going to describe is fine-tuning. The second approach is called Retrieval-based Learning (or Search-Ask approach). The third approach is In-Context Learning.

🌾 Fine-tuning LLMs

OpenAI has a great introduction to fine-tuning LLMs, specifically GPTs. This is an improvement over few-shot learning where we provide several examples of expected output, in addition to our question.

For example, let’s say we are using GPT to find whether a given tweet expresses sarcasm, irony, happiness, etc. Simply giving it 50 tweets asking it to find the sentiment wouldn’t lead us to great results. However, the same prompt when we provide a few examples of tweets with their sentiments would lead to a much better result.

Fine-tuning generalises this to a much larger space. Once a model has been fine-tuned, you won't need to provide as many examples in the prompt. This saves costs and enables lower-latency requests.

Some common use cases where fine-tuning can improve results:
Setting the style, tone, format, or other qualitative aspects
Improving reliability at producing a desired output
Correcting failures to follow complex prompts
Handling many edge cases in specific ways
Performing a new skill or task that’s hard to articulate in a prompt

Generally, you need a number of prompt-response pairs to achieve this.

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

Fine-tuning has improved model accuracies substantially. UC Berkeley’s Vicuna was LLaMA fine-tuned on best conversations shared on sharegpt.com. It outperformed the base LLaMA, Stanford’s Alpaca, and even Google Bard and ChatGPT in many cases. Power of fine-tuning!

Whatever the model learns, it will go into its long-term memory. Therefore, fine-tuning is great for learning new skill but doesn’t help much with factual recall.

🔍 Retrieval-based Learning

The second approach of enhancing capabilities of your LLM is by supplementing its knowledge bank. You can think of fine-tuning as regurgitating a textbook the night before the exam versus retrieval-based learning as having an open-book exam. The model would still need to have enough knowledge of the facts to know where to search, but can always open the book chapter to learn more. I used to call this Search-Ask approach but Retrieval-based Learning sounds fancy.

The first step in this method is to convert all existing outer knowledge into an external database of embeddings. Embedding translates webpages (or any documents) into numerical vectors, enabling LLMs to understand and process them. Tools like VectorDB, including Chroma, DuckDB, and Pinecone, help in this conversion, allowing moderately accurate factual recall.

In my experience, OpenAI’s embeddings work best with GPT-3.5 or GPT-4 API. You can do all criss-cross of this, thanks to Langchain: using Pinecone for embeddings and LLaMA as LLM, for example.

By incorporating these vectors, GPT forms a long-term memory structure. When a user poses a question, GPT searches the VectorDB using similarity measures to find relevant information, making this information part of its short-term memory. Then, it supplies the LLM with this short-term memory as context and generates the response.

In the next letter, I will show an example of how this works.

📚 In-Context Learning

The third method is to provide all additional information as context in prompt. In this approach, webpages are cleaned and converted into text, which is then passed to GPT as context. This enables the model to have a rich contextual understanding.

This method is great for factual recall but is severely limited by the input length. (I suspect Bing’s new search is based on this method.) More than two years ago, OpenAI demonstrated how GPT-3 was much more accurate after including context from web searches than by itself.

When a user asks a question, the model utilizes preprompting, treating the question as an input and the provided context as the context. The generation of the output then follows this structured format. This method is particularly valuable for developing reasoning skills. By training with context, the model can learn to follow a chain of thought and reach logical conclusions.

Unlike fine-tuning which requires lots of examples and results in permanent changes to base model, In-Context Learning can help with just a few examples, or Few Shot Learning.

Though highly accurate, this method's limitation lies in its short-term memory capacity. Following my previous analogy of textbooks in exams, In-Context Learning is like having a one page cheatsheet. The complexity of handling vast amounts of context might lead to memory constraints.

Again, OpenAI has a good introduction to how this method works, especially using their API.

⭕ Conclusion

Fine-tuning LLMs is a multifaceted process that requires careful consideration of the method used. Custom Training offers a straightforward approach but may lack depth. Embedding enhances the model's recall ability but may struggle with skill training. In-context Training excels in accuracy and reasoning skills but is constrained by memory limits.

In practice, the choice of fine-tuning method depends on the specific requirements of the task and the characteristics of the data at hand. A hybrid approach, leveraging the strengths of each method, may present an innovative way forward in fine-tuning LLMs.

Investments in understanding these techniques, experimenting with their combinations, and developing new methodologies will continue to propel the field of AI and machine learning. The pursuit of more accurate, skillful, and nuanced models remains an exciting and essential journey in modern technology.

➡️ Up Next…

Next week, I am going to cover an example of how I trained CustomGPT on all of my personal notes and what did I learn from it. Stay tuned!

🦾 Links

See you next week!

— Harsh

Next — Today I Learnt About Data Science