LLM Watch

LLM Watch

The Week in AI Agents

AI Agents of the Week

Agent research you should know about

Pascal Biese's avatar
Pascal Biese
Oct 19, 2025
∙ Paid
4
1
Share

Executive Summary

Dynamic Memory as Action: Researchers introduced a Memory-as-Action framework that lets an autonomous agent manage its own working memory by actively deleting or editing context, instead of relying on preset heuristics. By framing memory management as part of the agent’s policy (learned via reinforcement learning), the agent can strategically forget irrelevant details and focus on long-term objectives. This yielded improved task performance and efficiency on long-horizon tasks by preventing context overload.

Multi-Agent RL Breakthrough: A new method called AT-GRPO applied on-policy reinforcement learning to multiple collaborating LLM agents. It introduced a grouping strategy that trains agents per role and turn, overcoming the instability of standard RL in multi-agent prompts. The result was a massive jump in performance: on long-horizon planning tasks, multi-agent accuracy leapt from ~14% to ~98% (vs. a single-agent RL baseline), and notable gains were seen in coding (+~5%) and math (+~13%) reasoning tasks. This demonstrates that co-training agents together can dramatically enhance their planning and reasoning abilities.

Efficiency via Shared Cache: To address redundant computation in agent teams, the KVCOMM framework enables agents to share their “thought process” caches. It cleverly reuses transformer key-value caches across agents by aligning their context offsets. KVCOMM achieved ~70% reuse of computation across diverse multi-agent tasks (tool use, math, coding) with no loss in output quality, and delivered up to 7.8× speedups in a 5-agent setting. This points to a future where agent swarms communicate more efficiently, avoiding wasted cycles.

Generating Harder Agent Tasks: A “ProgSearch” data-generation pipeline tackled the challenge of training agents for long, tool-using missions. It synthesizes question-answer tasks of progressively increasing difficulty, using a baseline web agent in the loop to ensure each new task is just beyond current capabilities. The resulting dataset, though smaller, had 2× more diverse tool-use actions than prior sets and produced agents that avoid repetitive strategies. Models fine-tuned on this data outperformed those trained on conventional data (up to +8-23% on benchmarks). This underscores that smarter training data - not just bigger models - is key to autonomous agent prowess.

Broader Trends: A unifying theme is structured agent reflectiveness and adaptability. From memory systems that let agents forget and evolve their knowledge structures, to multi-agent algorithms that foster cooperative problem-solving, the research this week pushes AI agents toward greater autonomy and long-horizon competency. The community is also beginning to formalize what it means for an AI agent society to be reliable and safe: a new modeling framework defines dozens of verifiable properties (liveness, safety, fairness, etc.) for multi-agent task orchestration, aiming to ensure these powerful agents remain correct and secure even as they become more independent.

In the deep dive below, we unpack each of these developments – their core innovations, why they matter for autonomous AI, the problems they tackle, and what they signal for the future of agentic AI.

Keep reading with a 7-day free trial

Subscribe to LLM Watch to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Pascal Biese
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture