LLM Watch

LLM Watch

Share this post

LLM Watch
LLM Watch
Bursting the "AI Is Just Memorization"-Bubble
Copy link
Facebook
Email
Notes
More
Deep Dives

Bursting the "AI Is Just Memorization"-Bubble

A Deep Dive into Measuring LLM Capacity

Pascal Biese's avatar
Pascal Biese
Jun 04, 2025
∙ Paid
11

Share this post

LLM Watch
LLM Watch
Bursting the "AI Is Just Memorization"-Bubble
Copy link
Facebook
Email
Notes
More
2
Share

Recent advancements in large language models (LLMs) - and more generally, Generative AI - have sparked intense debate about data memorization. Can these models reproduce their training data verbatim? How much information do they actually store? And perhaps most importantly, when does beneficial learning end and problematic memorization begin?

A research coalition consisting of researchers from Meta's FAIR, Google DeepMind, Cornell, and NVIDIA set out to shed some light on these questions - and with success, it seems. Their new paper titled "How much do language models memorize?" provides a rigorous mathematical framework for measuring memorization and delivers surprising insights about the fundamental capacity limits of transformer models.

What we'll cover in this article:

  • Why existing definitions of memorization fall short

  • A new compression-based framework for measuring memorization

  • The surprising discovery that GPT-style models store ~3.6 bits per parameter

  • How memorization relates to the double descent phenomenon

  • Practical scaling laws for membership inference attacks

  • What this means for the future of LLM development

Are you ready? Let's dive in.

1. The Memorization Problem: Why Current Definitions Don't Work

Before we can measure how much models memorize, we need to define what memorization actually means. This turns out to be surprisingly tricky.

The Extraction Fallacy

Most existing work defines memorization through extraction: if you can prompt a model to generate a specific training sequence, it must have memorized it. But the authors point out a critical flaw in this reasoning. Modern LLMs can be coerced to output almost any string with the right prompt. As they note, "the fact that a model outputs something is not necessarily a sign of memorization."

Consider this example: if you prompt a model with "What is 2^100?" and it correctly responds with "1,267,650,600,228,229,401,496,703,205,376", has it memorized this specific fact, or has it learned to perform exponentiation? The extraction-based definition can't distinguish between these fundamentally different scenarios.

The Stability Problem

Other definitions rely on differential privacy or influence functions, measuring how a model changes when a training example is added or removed. But these approaches have their own limitations:

  • They depend heavily on the training algorithm

  • They measure worst-case behavior rather than typical memorization

  • They can't be applied to a single model in isolation

The authors needed something different - a definition that could separate memorization from generalization, work at the sample level, and be independent of the training process.

Keep reading with a 7-day free trial

Subscribe to LLM Watch to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Pascal Biese
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More