Bursting the "AI Is Just Memorization"-Bubble
A Deep Dive into Measuring LLM Capacity
Recent advancements in large language models (LLMs) - and more generally, Generative AI - have sparked intense debate about data memorization. Can these models reproduce their training data verbatim? How much information do they actually store? And perhaps most importantly, when does beneficial learning end and problematic memorization begin?
A research coalition consisting of researchers from Meta's FAIR, Google DeepMind, Cornell, and NVIDIA set out to shed some light on these questions - and with success, it seems. Their new paper titled "How much do language models memorize?" provides a rigorous mathematical framework for measuring memorization and delivers surprising insights about the fundamental capacity limits of transformer models.
What we'll cover in this article:
Why existing definitions of memorization fall short
A new compression-based framework for measuring memorization
The surprising discovery that GPT-style models store ~3.6 bits per parameter
How memorization relates to the double descent phenomenon
Practical scaling laws for membership inference attacks
What this means for the future of LLM development
Are you ready? Let's dive in.
1. The Memorization Problem: Why Current Definitions Don't Work
Before we can measure how much models memorize, we need to define what memorization actually means. This turns out to be surprisingly tricky.
The Extraction Fallacy
Most existing work defines memorization through extraction: if you can prompt a model to generate a specific training sequence, it must have memorized it. But the authors point out a critical flaw in this reasoning. Modern LLMs can be coerced to output almost any string with the right prompt. As they note, "the fact that a model outputs something is not necessarily a sign of memorization."
Consider this example: if you prompt a model with "What is 2^100?" and it correctly responds with "1,267,650,600,228,229,401,496,703,205,376", has it memorized this specific fact, or has it learned to perform exponentiation? The extraction-based definition can't distinguish between these fundamentally different scenarios.
The Stability Problem
Other definitions rely on differential privacy or influence functions, measuring how a model changes when a training example is added or removed. But these approaches have their own limitations:
They depend heavily on the training algorithm
They measure worst-case behavior rather than typical memorization
They can't be applied to a single model in isolation
The authors needed something different - a definition that could separate memorization from generalization, work at the sample level, and be independent of the training process.
Keep reading with a 7-day free trial
Subscribe to LLM Watch to keep reading this post and get 7 days of free access to the full post archives.