LLM Watch

LLM Watch

The Week in AI Agents

AI Agents of the Week: Papers You Should Know About

Get ahead of the curve with LLM Watch

Mar 22, 2026
∙ Paid

Executive Summary

Reasoning Efficiency and Balanced Thinking: Large Reasoning Models are powerful but wasteful - they overthink simple problems and underthink hard ones. This week, two papers attack the efficiency question from opposite ends. ReBalance introduces a training-free framework that uses confidence-based steering vectors to dynamically prune redundancy or promote exploration in real time, improving accuracy while reducing output length across nine benchmarks and four model sizes (0.5B to 32B). Meanwhile, Nemotron-Cascade 2 demonstrates that intensive post-training via Cascade RL and multi-domain on-policy distillation can pack gold-medal-level mathematical and coding reasoning into a 30B MoE model with only 3B activated parameters - achieving comparable performance to frontier models with 20x fewer parameters. Together, these papers frame a central tension: do you steer the reasoning you already have, or distill better reasoning into a smaller model?

Strategic Alignment and Game-Theoretic Behavior: A pair of papers this week reveal a fascinating paradox at the intersection of alignment and multi-agent strategy. Alignment Makes Language Models Normative, Not Descriptive finds that aligned models outperform base models on one-shot textbook games but lose to base models by nearly 10:1 when predicting real human choices in multi-round strategic interactions - bargaining, negotiation, and repeated games where reciprocity and retaliation matter. In contrast, Reasonably Reasoning AI Agents Can Avoid Game-Theoretic Failures proves theoretically and empirically that off-the-shelf reasoning agents can achieve Nash-like equilibrium play zero-shot, without any post-training alignment. For teams deploying agents in economic or competitive environments, the implication is striking: alignment may help with normative compliance but could actively hinder realistic strategic behavior.

Memory Architecture for Long-Horizon Agents: Two papers converge on the insight that how agents remember matters more than how much they remember, but they propose competing solutions. AndroTMem diagnoses that performance degradation in long-horizon GUI tasks stems primarily from within-task memory failures and proposes Anchored State Memory (ASM), which improves task completion rates by 5% - 30.16% over full-sequence replay. Memento-Skills takes a different approach entirely: agents build and refine a library of reusable markdown-based skills as externalized memory, achieving 26.2% and 116.2% relative accuracy improvements on the General AI Assistants benchmark and Humanity’s Last Exam, respectively. The shared lesson: structured, selective memory outperforms brute-force replay.

Governance and Organizational Deployment: As agents grow more capable, the question of how to constrain and govern them in organizational settings becomes urgent. The Agentic Business Process Management manifesto articulates a paradigm shift from traditional automation-oriented BPM toward systems built on “framed autonomy,” where agents perceive, reason, and act within explicit process frames. This conceptual framework - demanding explainability, conversational actionability, and self-modification - offers a roadmap for bridging AI, BPM, and multi-agent systems research. It also surfaces a tension with self-improving agent architectures like Memento-Skills, where autonomous evolution may conflict with organizational control requirements.

Instruction-Guided Generation and Semantic Anchoring: Rounding out the week, SAMA addresses a persistent challenge in instruction-guided video editing: balancing precise semantic modifications with faithful motion preservation. By factorizing the problem into semantic anchoring and motion alignment - and pre-training on motion-centric restoration tasks - SAMA achieves state-of-the-art open-source performance competitive with commercial systems like Kling-Omni. The factorized pre-training alone yields strong zero-shot editing ability, validating the decomposition. For agent builders, SAMA’s architectural insight - anchor the semantics, then align the dynamics - offers a transferable pattern for any domain where agents must plan structural changes while preserving temporal coherence.

User's avatar

Continue reading this post for free, courtesy of Pascal Biese.

Or purchase a paid subscription.
© 2026 Pascal Biese · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture