AI Agents of the Week: Papers You Should Know About

Get ahead of the curve with LLM Watch

Aug 10, 2025

∙ Paid

In the first full week of August, I want to highlight five new papers that advance the state of autonomous agents in learning, coordination, and system-level robustness:

One paper introduces a self-evolving agent that learns to operate unfamiliar software without any labeled data or human guidance – a major step toward tool-generalist agents.
Another work presents a vision-language shopping sandbox that reveals surprising biases in AI buyers, giving us tools to probe and debug agent behavior in real-world economic settings.
A routing strategy for multi-agent systems shows how token-efficient context sharing can both reduce cost and improve accuracy – a blueprint for more scalable agent swarms.
A new orchestration framework lets agents dynamically rebalance roles and recover from teammate failure, boosting the resilience of autonomous teams under real-world stress.
And a timely survey proposes "AgentOps" as a new discipline, outlining how we monitor, detect, and respond when autonomous systems misbehave.

This week’s papers are shifting the conversation from one-off demos to sustained deployments: how can agents learn new tools, audit themselves, collaborate effectively, and stay aligned over time? Below, we dive into each paper, explaining the core innovations, what problems they tackle, and what new capabilities they unlock.

Continue reading this post for free, courtesy of Pascal Biese.

Or purchase a paid subscription.