AI Agents of the Week: Papers You Should Know About

Get ahead of the curve with LLM Watch

Aug 17, 2025

∙ Paid

This week’s research highlights major strides in agentic AI around memory, planning, and robustness.

Memory took center stage with new architectures enabling agents to learn cumulatively from experience. One paper introduced a “procedural memory” system that distills an agent’s past trajectories into reusable step-by-step instructions and higher-level scripts, allowing continuous skill improvement and even transfer of learned skills between agents. Another work gave each agent in a multi-LLM team its own intrinsic memory, a structured record of role-specific information that evolves with the agent’s outputs – dramatically boosting coherence and efficiency in collaborative tasks (achieving a 38.6% performance gain over previous methods).
The state of planning: a comprehensive benchmark study found that while large language model agents can generate reasonable plans for simple tasks, they “continue to struggle” with complex scenarios requiring strict resource management and accurate state tracking. This underscores that purely LLM-based planners still fall short on hard problems, reinforcing the need to integrate LLM reasoning with classical planning techniques for reliability.
On the robustness front, researchers are proactively tackling the security and scalability of autonomous agents. A new threat modeling framework (MAESTRO) systematically mapped how multi-agent systems can be attacked (e.g. via replayed network traffic or “poisoned” memories) and recommended a defense-in-depth strategy to harden agents against such adversarial exploits.
Meanwhile, a multi-agent network monitoring system (NetMoniAI) demonstrated that decentralizing analysis to many lightweight agents, coordinated by a central controller, yields efficient and scalable defense against cyber threats – cutting redundant work and speeding up responses without sacrificing detection accuracy.

This week’s research focus was clearly on agentic systems that not only plan and reason, but also remember past experiences and defend their operations. Memory has probably been the biggest theme this month. We’re seeing more advanced implementations with different ideas and goals - such as procedural and role-specific memory. Below, we will go through all of these research highlights, take a look at their core contributions, why they are important and what that might mean for researchers and practitioners alike.

Keep reading with a 7-day free trial

Subscribe to LLM Watch to keep reading this post and get 7 days of free access to the full post archives.