AI Agents of the Week

The research you should know about

Oct 12, 2025

∙ Paid

This week, we will take a look at breakthroughs in areas spanning memory systems, tool use, planning, multi-agent collaboration, and self-improvement. Researchers are tackling longstanding challenges – from enabling long-term memory and dynamic tool selection to improving coordination and enabling agents to learn on the fly. Key highlights include:

Dynamic Memory Architectures: A-Mem introduces an agentic memory system that organizes knowledge Zettelkasten-style, linking new information to past memories to continuously refine an agent’s understanding. This dynamic memory outperformed prior static-memory baselines and is a step toward long-lived, context-aware agents.
Learning Tool Capabilities: TOOLMEM equips agents with a “tool capability memory” that records the strengths and weaknesses of different AI tools. By remembering which tool excels in which scenario, agents improved task performance by choosing the right tool for the job – a crucial advancement for tool-using autonomous systems.
Integrating Symbolic Planning: To reduce LLM agents’ errors in complex tasks, Agent+P combines neural and symbolic approaches. It uses a symbolic planner on a learned UI graph to guide an LLM-based user interface agent, boosting success by up to 14% and cutting unnecessary steps by ~38%. This showcases the power of structured planning in keeping autonomous agents on track.
Multi-Agent Collaboration Frameworks: New paradigms are emerging to enable multiple AI agents to work together without rigid central control. A blackboard architecture lets agents post and retrieve information on a shared board, volunteering for tasks based on expertise – yielding 13–57% better task success than traditional “master-slave” setups. Meanwhile, the ALMAS framework envisions autonomous LLM agents taking on specialized roles in a software engineering team, coordinating to handle an entire project lifecycle.
Structured Self-Improvement: Researchers explored agents that learn from their own mistakes. Agentic Context Engineering (ACE) treats an agent’s prompt context as an evolving playbook, growing and refining its strategies with each interaction to avoid “context collapse” and brevity bias. ACE achieved +10.6% higher success on agent benchmarks with lower cost, even matching a GPT-4-level agent using a smaller model. Complementing this, a Test-Time Self-Improvement (TT-SI) method lets agents identify failures and generate new training examples on the fly to fine-tune themselves – boosting accuracy by ~5.5% with a tiny fraction of the data.

In the article below, we unpack each of these developments – their core innovations, why they matter for autonomous AI, the problems they tackle, and what they signal for the future of agentic AI.

Keep reading with a 7-day free trial

Subscribe to LLM Watch to keep reading this post and get 7 days of free access to the full post archives.