LLM Watch

LLM Watch

The Week in AI Agents

AI Agent of the Week: Papers You Should Know About

Get ahead of the curve with LLM Watch

Pascal Biese's avatar
Pascal Biese
Oct 26, 2025
∙ Paid

Executive Summary

  • Memory & Self-Reflection as Learning Tools: New techniques allow Large Language Model (LLM) agents to learn on the fly by storing experiences and critiques in long-term memory. One framework combined episodic (instance-specific) and semantic (generalized) memory to adapt without retraining, boosting accuracy by 24.8% over standard retrieval methods. This memory-driven, reflective learning approach makes agents more adaptive and interpretable, hinting that continual self-feedback can replace expensive fine-tuning.

  • Multi-Agent Systems & “Thought” Sharing: Autonomous agents are increasingly tackling tasks as teams. A notable multi-agent system (EDR) coordinated specialized sub-agents (planning, web search, code analysis, etc.) and a reflection loop to produce enterprise reports, outperforming prior agent systems on open-ended benchmarks. In parallel, researchers proposed letting agents communicate beyond natural language - essentially “telepathic” sharing of latent thoughts. They prove that agents can exchange hidden state directly with identifiable shared “ideas,” and demonstrate that this thought communication markedly improves collaboration.

  • Better Long-Horizon Reasoning: Several advances addressed the challenge of planning and learning over long tasks. A new credit assignment method called SALT constructs trajectory graphs to assign per-step rewards, stabilizing reinforcement learning for multi-step tasks. By disentangling good and bad actions in long sequences, SALT boosted performance on complex benchmarks like WebShop and ALFWorld. Meanwhile, a world-model evaluation protocol (WorldTest) separates exploration from testing to gauge how well agents understand environment dynamics beyond just reward-hacking. Using a 43-environment suite (AutumnBench), researchers found that humans still far surpass agents in predicting consequences and planning, revealing big headroom for truly generalizable world models.

  • Lifelong and Efficient Learning: Pushing toward continual learning, one NeurIPS-bound paper introduced Continual Knowledge Adaptation (CKA-RL), which stores key knowledge vectors from past tasks and reuses them on new tasks. This prevented catastrophic forgetting and improved forward transfer by 8%, as the agent could accumulate skills over time. In a similar vein, the Memo architecture improved long-term memory usage in embodied agents by periodically summarizing past observations into compact embeddings. These summaries let a transformer policy handle very long timeframes with far less compute, remaining robust even when context windows must be truncated. Both approaches point to more memory-efficient agents that can learn indefinitely.

  • Tool Use & Modular Reasoning: Instead of relying on one giant model, agents are learning to use specialized tools. An analysis of vision-LLM models showed they often hallucinate or over-rely on text cues. The proposed solution: an agent-based architecture that interleaves LLM reasoning with lightweight visual modules (for object recognition, spatial checks, etc.). By iteratively calling the right tool and refining its chain-of-thought, a 7B parameter agent achieved +10.3 and +6.0 point gains on visual reasoning benchmarks, matching or beating models many times larger. This underscores that future autonomous agents will heavily incorporate tool use and modular sub-routines - from databases to vision APIs - to enhance accuracy and efficiency.

Following the executive summary, we take a closer look at each of these contributions, examining their core innovations, why they matter for autonomous AI, the problems they address, and their future implications.

User's avatar

Continue reading this post for free, courtesy of Pascal Biese.

Or purchase a paid subscription.
© 2026 Pascal Biese · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture