🤏 All You Need to Know About Small Language Models

And how to solve the strawberry problem

Nov 01, 2024

A survey on SLMs
A way towards more brain-like inference
How to better count the r’s in strawberry

MLOps/GenAI World is all about solving real-world problems and sharing genuine experiences with production-grade AI systems.

Join leaders and engineers from Microsoft, Huggingface, BlackRock and many more for the following tracks:

Real World Case Studies
Business & Strategy
Technical & Research (levels 1-7)
Workshops (levels 1-7)
In-person coding sessions

Get Access to 30+ virtual workshops, 60+ in-person talks and 90+ hours of recordings by claiming your personal discount.

Last Chance to Save $75 USD

1. A Survey of Small Language Models

Watching: “Small” Language Models (paper)

What problem does it solve? While Large Language Models (LLMs) have been dominating the headlines, Small Language Models (SLMs) are becoming increasingly important. SLMs are designed to be efficient and performant while requiring minimal computational resources. This makes them ideal for various settings, including on-device, mobile, and edge devices. As the demand for language models in resource-constrained environments grows, the need for a comprehensive understanding of SLMs becomes crucial.

How does it solve the problem? The survey presents a novel taxonomy for categorizing the methods used to optimize SLMs. It covers various techniques, including model compression, pruning, and quantization. Model compression techniques aim to reduce the size of the model while maintaining its performance. Pruning involves removing less important weights or connections from the model, reducing its complexity. Quantization techniques focus on reducing the precision of the model's parameters, leading to smaller model sizes and faster inference times. By systematically organizing these methods, the survey provides a clear overview of the approaches used to create efficient SLMs.

What's next? Despite the advancements in SLMs, several open challenges remain to be addressed. These challenges may include further improving the efficiency-performance trade-off, developing more effective compression techniques, and ensuring the robustness and generalization capabilities of SLMs across various tasks and domains. Additionally, there is a need for standardized benchmark datasets and evaluation metrics specifically tailored for SLMs to facilitate fair comparisons and track progress in the field.

2. A prescriptive theory for brain-like inference

Watching: Brain-like inference (paper)

What problem does it solve? The Evidence Lower Bound (ELBO) is a widely used objective function for training deep generative models like Variational Autoencoders (VAEs). While ELBO maximization has been useful in interpreting generative models, including diffusion models, it is often considered too broad to provide specific guidance for designing architectures in neuroscience or machine learning. This work aims to bridge the gap between ELBO maximization and prescriptive theories for NeuroAI.

How does it solve the problem? The authors show that maximizing ELBO under Poisson assumptions for general sequence data leads to a spiking neural network called the iterative Poisson VAE (iP-VAE). This model performs Bayesian posterior inference through its membrane potential dynamics, establishing a closer connection to biological neurons compared to previous brain-inspired predictive coding models based on Gaussian assumptions. The iP-VAE learns sparser representations and demonstrates better generalization to out-of-distribution samples compared to amortized and iterative VAEs.

What's next? The findings suggest that optimizing ELBO with Poisson assumptions provides a solid foundation for developing prescriptive theories in NeuroAI. This approach could lead to more biologically plausible models that better capture the dynamics of real neurons while maintaining the benefits of deep generative models. Additionally, the insights gained from this work could inspire new architectures and training strategies in both neuroscience and machine learning.

3. Counting Ability of Large Language Models and Impact of Tokenization

Watching: Tokenization (paper)

What problem does it solve? Transformers, the architecture behind most modern Large Language Models (LLMs), have inherent limitations when it comes to reasoning capabilities. Unlike recurrent neural networks, Transformers lack recurrent connections, which limits their computational depth. This places them in the complexity class TC0, making them theoretically incapable of solving tasks that require increasingly deep reasoning as input length grows. Counting, a fundamental component of many reasoning tasks, is one such task that requires reasoning depth to grow linearly for inductive performance.

How does it solve the problem? Recent work has shown that Chain of Thought (CoT) reasoning can help alleviate some of the architectural limitations of Transformers in counting tasks. However, the role of tokenization in these models has received little attention. Unlike expert models that often use character-level tokenization, LLMs typically rely on byte-level (BPE) tokenizers, which fundamentally alters the way reasoning is processed. This study investigates the impact of tokenization on the counting abilities of LLMs and uncovers substantial performance variations based on input tokenization differences.

What's next? The findings of this study highlight the importance of considering tokenization choices when designing and evaluating LLMs for reasoning tasks. By understanding how tokenization can undermine models' theoretical computability, researchers can develop new tokenization methods that enhance reasoning capabilities in LLMs. This work opens up new avenues for improving the reasoning abilities of Transformer-based models and brings us closer to creating LLMs that can handle reasoning tasks more reliably.

LLM Watch

Discussion about this post