Introduction to LIMO: Less is More for LLM Reasoning
DeepSeek-R1 was only the beginning—find out how LIMO takes Data Efficiency to another level
The recent paper "LIMO: Less is More for Reasoning" challenges foundational assumptions about how large language models (LLMs) acquire complex reasoning capabilities. By demonstrating that meticulously curated instruction data—as few as 817 examples—can outperform models trained on 100x more data, the authors propose a paradigm shift in how we approach reasoning in LLMs.
This article explores the technical innovations, empirical findings, and broader implications of how LLM reasoning might change in the near future.
The LIMO Hypothesis: A New Lens on Reasoning
At its core, the Less-Is-More Reasoning (LIMO) Hypothesis states that:
Sophisticated reasoning capabilities emerge when two conditions converge:
Rich pre-trained knowledge embedded during LLM pre-training.
Precisely orchestrated demonstrations (cognitive templates) during fine-tuning.
This hypothesis overturns the long-held belief that complex reasoning tasks require massive datasets (e.g., >100k samples). Instead, LIMO argues that modern foundation models (e.g., Qwen2.5, Llama 3) already encode extensive domain knowledge; the challenge lies in eliciting this knowledge through high-quality examples that guide the model to "think" systematically.
They define the following requirements for success:
Pre-trained Knowledge: Modern LLMs like Qwen2.5-32B incorporate vast mathematical content during pre-training (3.7T tokens for Llama 3’s math-focused training).
Inference-Time Scaling: Techniques allowing extended reasoning chains (e.g., parallel sampling, tree search) provide the "cognitive workspace" for multi-step problem-solving.
Data Curation: Quality Over Quantity
Data quality is core to the LIMO hypothesis and to ensure data quality, we need proper data curation. LIMO’s success hinges on its systematic data curation process, which prioritizes depth over breadth:
Question Selection
Difficulty: Problems are filtered using state-of-the-art models (e.g., Qwen2.5-Math-7B-Instruct), retaining only those with <10% solve rates.
Diversity: Selected from advanced benchmarks (AIME, OlympiadBench) and multilingual sources (Chinese Gaokao), ensuring coverage of algebra, geometry, and proof-based challenges.
Out-of-Distribution (OOD) Focus: 30% of questions intentionally deviate from standard datasets to test generalization.
From an initial pool of 10M+ problems, only 817 survived rigorous filtration.
Keep reading with a 7-day free trial
Subscribe to LLM Watch to keep reading this post and get 7 days of free access to the full post archives.