Written by Brandon Yu I 2 min read
In the landscape of conversational AI, particularly with the advent of Large Language Models (LLMs), a prevalent issue is the occurrence of "hallucinations." These are instances where the AI asserts incorrect information with unwarranted confidence. Hallucinations in this context are not just occasional missteps; they are a recurrent challenge that underscores the limitations of these advanced systems.
AI hallucinations can be caused by various factors, including insufficient, outdated, or low-quality training data, overfitting, use of idioms or slang expressions, and adversarial attacks. These errors present a complex problem: they can undermine the trustworthiness of LLMs.
Imagine a tool that's incredibly resourceful, yet occasionally presents fiction as fact. This is the current reality with conversational AIs—they are powerful, yet imperfect, particularly in discerning and relaying verifiable truths.
How LLMs work
LLMs are based on a transformer architecture, which is a type of neural network that can process sequential data, such as text, more efficiently than traditional recurrent neural networks.
It’s important to note that the LLMs aren’t infallible. Their outputs can be inaccurate or biased, largely dependent on their training data. The training process in itself is computationally intensive, involving both the pre-training phase and a fine-tuning phase.
During pre-training, the model is exposed to a massive corpus of unlabelled text. The aim of this phase is to train the model to predict the next word in a sentence or sequence. As you can imagine, there will be a ton of gaps through this phase that could contribute to the downstream hallucinations.
This is where step 2, fine-tuning would come in. The pre-trained model would then be further trained on a smaller, task-specific dataset that’s relevant to the end-subject matter. For a mental health LLM for instance, you would feed it leading research and best practices in that space.
Instances of LLM hallucinations
- Overfitting to Training Data: LLMs are trained on a diverse set of data from the internet, including articles, books, and websites. Sometimes, they might overfit to particular patterns or anomalies in the training data. This means the model can pick up and replicate quirks, biases, or errors that were present in its training set. When faced with a query, the LLM might generate a response that seems right because it echoes these ingrained patterns but is actually incorrect in the given context.
- Lack of Knowledge: Despite their extensive training, most public LLMs do not have the capability of understanding real time knowledge. There is a certain cutoff date that most LLMs are trained on, which means outdated or incomplete information encoded beyond their last training update would not be factored into their existing knowledge base. This could also be due to faults in prompting, particularly prompts that are designed to confuse the AI, as well as using slang expressions or idioms.
- The Nature of Predictive Modeling: At their core, LLMs are predictive models—they predict the next most likely word in a sequence based on probabilities. This approach is not rooted in the logical progression of ideas or facts. Thus, when generating language, the model might "hallucinate" by stringing together words and phrases that are statistically plausible but semantically nonsensical or factually incorrect in the real world. The model doesn't aim to be logical or factual; it aims to be likely, according to its training.
The nature of AI-based hallucinations are challenging to developers and those who interface with AI. Fundamentally, this erodes trust in the AI models and is a strong point of focus for AI developers to mitigate and proactively address.
Solving for hallucinations
In our experience building out PokerGPT, our AI-based poker coach, we weren’t immune to the effects of hallucinations. Our platform, powered by GPT-4, had some trouble reading the nuances of poker shorthand, such as suits or completion of poker hands. This led to a few gaps filled by hallucinations as PokerGPT attempted to reconcile them.
So how did we solve these?
There were three main strategies that we incorporated within our prompt engineering.
- Gave specific examples to PokerGPT. By fine-tuning our model through feeding specific instances where poker nuances would typically occur (eg. calculating pot sizes or reading suits), we were able to provide a template that PokerGPT could follow. It used this format to replicate it’s own answers.
- Repeating the LLM’s interpretations of the user’s output back to the user. This temporary solution has helped us catch frequent areas where the model would tend to hallucinate. By allowing it to translate complex poker jargon back into plain English, we were able to identify which areas of the input that PokerGPT would struggle with.
- Incorporate a Human in the loop. In our architecture, we’ve incorporated ways for expert human coaches to be able to observe and supplement AI responses. We would take these coach responses and use their examples directly to fine-tune our prompts. In the interim, users would be able to have high-quality feedback from expert coaches, such that their experience does not get affected by hallucinations.
Our predictions for the AI hallucinations in the future
Looking ahead, we can anticipate significant advancements in the realm of AI, particularly concerning the phenomenon of hallucinations in Large Language Models (LLMs).
As the cost of developing and deploying LLMs continues to fall, these models will become more accessible and affordable. This democratization will enable more comprehensive and intricate training, using lengthier prompts and a broader array of examples, which will naturally help to curtail the frequency of hallucinations by refining the models’ predictions. OpenAI’s Dev Day recently introduced GPT-4 Turbo, an innovative model that offers a speed and cost efficiency of 2-3 times that of its predecessor, coupled with a context window that is quadruple in size. This expanded context window enables the incorporation of more detailed examples in the system prompts of a model to address inaccuracies, while maintaining the benefits of increased speed and reduced cost.
Additionally, we're likely to see the rise of specialized LLMs tailored for distinct applications. These models will be trained on targeted datasets, such as proprietary company information or highly focused topics, leading to fewer hallucinations and more accurate data outputs.
Furthermore, LLMs are expected to evolve to a point where they can assess their own confidence levels. When unsure, they will have mechanisms to signal the need for human verification or to solicit specific user feedback, enhancing their reliability and overall performance. Advancements like Microsoft’s AutoGen enables the potential of having LLMs check accuracies of other LLMs through multi-chain collaborations.
It truly is an exciting time to be learning and building with Generative AI. As there is a growing focus on mitigating these hallucinations, one can take active steps in fine-tuning AI models or implementing some feedback loop to override such hallucinations.This evolution will mark a significant leap forward in our interaction with conversational AI, making these tools more trustworthy and effective as they become an integral part of our digital ecosystem.