AI Agents of the Week: Building an AI Horcrux

Get ahead of the curve with LLM Watch

Nov 30, 2025

∙ Paid

Executive Summary

In this week’s batch, researchers tackled core challenges in long-horizon reasoning, memory retention, multi-agent cooperation, self-reflection, and robustness. Key innovations include:

Hierarchical reasoning architectures that break down complex tasks and mitigate misaligned behavior like reward hacking.
New memory frameworks that use a just-in-time approach to maintain long-term context without information loss.
Multi-agent collaboration in latent space, enabling LLMs to share thoughts via embeddings for faster, more accurate joint reasoning.
Tool-oriented agents and orchestrators that combine specialized models and external tools to solve complex tasks at a fraction of the cost of a single giant model.
Self-improving agents that critique and refine their own outputs or strategies using reinforcement learning and tool feedback, without human intervention.
Safety mechanisms for autonomous agents, from detecting prompt injection attacks in web-browsing agents to interpretable task decomposition that reveals reward hacking.

In the sections below, we summarize and analyze each paper’s core contributions, why they matter for the quest to build autonomous AI, the problems they address, and potential future implications. Overall, the trend is toward agents that are more structured, memory-rich, collaborative, tool-equipped, self-correcting, and secure - moving us closer to robust autonomous AI systems.

Continue reading this post for free, courtesy of Pascal Biese.

Or purchase a paid subscription.