LLM Watch

LLM Watch

The Week in AI Agents

AI Agents of the Week: Papers You Should Know About

Get ahead of the curve with LLM Watch

Pascal Biese's avatar
Pascal Biese
Nov 02, 2025
∙ Paid
13
1
Share

Executive Summary

  • Parallel Planning with Tools: New frameworks are enabling large language model (LLM) agents to plan tasks as dependency graphs, allowing parallel tool use instead of strictly sequential ReAct-style execution. This boosts efficiency and accuracy on complex multi-step queries.

  • Agents that Self-Improve: Researchers demonstrated that LLM-based agents can learn by playing against themselves. A triplet of roles (question proposer, solver, judge) co-evolving via reinforcement learning led to measurable gains in general reasoning ability with minimal human supervision.

  • Multi-Agent Collaboration & Debate: New benchmarks and methods tackled multi-agent interaction. The DEBATE dataset captures thousands of real human debate messages to evaluate how well LLM agents simulate authentic group dynamics. Results show role-playing agents diverge from human behaviors, even after fine-tuning. Another study found that giving agents ways to communicate and verify each other’s moves (or feedback from the environment) dramatically improved their cooperative problem-solving and trustworthiness.

  • Long-Term Memory & Structured Reasoning: Innovative agent architectures are integrating hierarchical planning with memory. One new framework organized agents in a tree structure with parent-child divisions of labor and a long-term memory store. This yielded more flexible reasoning, efficient error correction, and reuse of past knowledge to improve performance on complex tasks like code generation.

  • Addressing Known Limitations: Researchers are also identifying blind spots of current agents. For example, LLM-based agents lack temporal awareness by default - a form of “temporal blindness” that causes mis-timed tool use. A dedicated evaluation shows models often misjudge when to re-call tools without explicit time cues. Another comparative study confirmed that even top LLMs still struggle with certain logical reasoning tasks humans find trivial, underscoring the need for continued advances in agent reasoning and alignment.


Keep reading with a 7-day free trial

Subscribe to LLM Watch to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Pascal Biese
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture