Watch#2: Small Models Matter and the Fight Against Hallucinations
Including a readling list at the bottom
Foreword:
Thank you for the first 200 subscriptions in under a week! While it’s much faster to build an audience on LinkedIn for me, I think that in the long-term, having a loyal community on Substack will be more valuable.
This week, I’l attach a reading list of papers in addition to the paper spotlights. So if all you’re looking for is a reading list without further commentary or analysis, feel free to skip to the “Papers of the Week” section at the bottom.
Have a great day all,
Pascal
In this issue:
The NExT generation of GPT models is modality-agnostic
It’s not about the size of a model, but what you train it on
“Think before you speak!”, a human solution to a machine problem
1. NExT-GPT: Any-to-Any Multimodal LLM
Watching: NExT-GPT (paper)
What problem does it solve? Text is only one modality of many. Not all information can be efficiently captured and conveyed with text and it’s the belief of researchers like Yann LeCun - and many others - that multimodal representations will lead to a deeper, more robust understanding of the world.
How does it solve the problem? NExT-GPT is trained on four different modalities in parallel: text, image, audio and video. But more importantly, it can also output any of these modalities. In addition to the usual Transformer architecture, the framework is also borrowing components from Diffusion Models and Multimodal Adapter research. The former are well-known for their success in Stable Diffusion and Midjourney, the latter is one of the most promising techniques for adding any modality you want to your model.
What’s next? NExT-GPT isn’t the first project that went in this direction, but it’s arguably the first one that provides a convincing demo and workflow. This might be the future of Generative AI - GPT-5 *cough*. We’re still early though and I personally don’t expect this technology to take off until next year.
2. Textbooks Are All You Need II: phi-1.5 technical report
Watching: phi-1.5 (paper)
What problem does it solve? Models are getting bigger and bigger, leaving not only consumers behind but also any company or research lab that doesn’t play in the Champions League in terms of budget and talent. And while scale alone brought us a lot of improvements, quantity doesn’t seem to help much when it comes to hallucinations, harmful content and a deep understanding of things.
How does it solve the problem? The team from Microsoft took a quality-first approach for their work. Phi-1.5 is the continuation of their success with Phi-1 - both models were trained on high quality textbook data - and it shows remarkable capabilities way beyond what one would expect from a 1.3B parameter model. As the researchers said themselves, it’s more of a 1.3B model behaving like a 13B model.
What’s next? There have been ongoing discussions ever since their first paper. Back then, sceptics raised concerns of data leakage that the team denied, pointing to their rigorous data selection criteria. This time around, the scepticism continued and I have to admit that some of the results are so astonishing that they are hard to believe. Time, transparency and further research will hopefully unravel this mystery for us.
3. Cognitive Mirage: A Review of Hallucinations in Large Language Models
Watching: LLM Hallucinations (paper)
What problem does it solve? “Hallucinations” has probably been one of the most used words in Tech this year. To the points where some people probably don’t want to hear about it anymore. But there’s a clear reason for it: companies don’t like uncertainty and the last thing they want is a model spreading false information - potentially even information harmful to their business - so research on ways to control Large Language Models and minimize hallucinations has become very popular.
How does it solve the problem? Since this is a review paper, it doesn’t directly solve the problem. But they provide a comprehensive overview over the research so far and from that, they’ve built several taxonomies that help understanding different research directions and approches. One such example can be seen above for the task of Hallucination Detection. The paper provides a solid starting point for anyone that wants to catch up with the literature.
What’s next? Hallucinations are a big problem and some may argue that we’ll never be able to completely avoid them. Others have even argued that hallucinations can be positive and should be explored as a form of creativity. There’s an acceptable level of error for every task, but usually, tolerance in most companies is very low when dealing with customer-facing applications. If we really want industry mainstream adoption, there’s still a way to go.
Papers of the Week:
Large Language Models optimizing themselves and finding the best prompts for each situation
CoT on steroids: bringing structure to Chain-of-Thought prompting
Tracking entities and their states with Large Language Models
TaskLAMA: Making Large Language Models understand complex tasks
Quantifying and attributing hallucinations in Large Language Model output
Hypothesis Search: inductive reasoning with Large Language Models
Have you seen any recent papers discussing AI bias based on training data and how it's addressed?