State of AI Agents: What OpenAI & Google Are Planning
Two trends that will dominate the next months: agent orchestration and resource optimization
Welcome to the first article of this new series! We’re one week into April there’s already a lot to talk about. From major product launches to strategic shifts to research insights: things aren’t slowing down, it seems.
Here’s what we’ll cover in this issue:
OpenAI's Architectural Shift
Google's Agent Architecture
Approaches to Multi-Agent Orchestration
Resource Optimization for AI Agents
New Evaluation Methodologies for Agent Systems
Let's dive into all of these topics together, step by step.
Wait, wait, wait! Before we do that, I would like to know your opinion on something.
1. OpenAI's Architectural Shift: From Assistants to Composable Agent Building Blocks
If you haven’t heard about it last month, you’ll hear about it now: OpenAI recently unveiled two major new developer offerings - the Responses API and the Agents SDK. These releases represent a fundamental shift in OpenAI's strategy, moving from pre-built assistants to providing modular building blocks for creating custom agents.
With the Responses API, they go from monolithic assistants to composable intelligence. It unifies what was fragmented until now, merging the simplicity of the Chat Completions API with the tool-using capabilities previously exclusive to the Assistants API. But there’s more behind this seemingly simple consolidation: the creation of a unified data structure for all agent interactions.
The API’s unified design simplifies how information is processed and passed between components. It reduces the complexity of handling different types of responses and interactions: tasks that previously required orchestrating multiple API calls can now be handled through a single endpoint, significantly reducing latency and code complexity.
The API also comes with several built-in tools that extend its capabilities beyond mere conversation. Web Search allows agents to retrieve up-to-date information with properly attributed sources. File Search provides an optimized retrieval mechanism for efficiently extracting relevant information from large document sets. Perhaps most ambitious is the Computer Use tool (in research preview), which enables agents to interact with computer interfaces by generating mouse and keyboard actions, opening new possibilities for automating desktop workflows.
Complementing the Responses API is the open-source Agents SDK, which addresses another crucial challenge in building sophisticated agent systems: coordinating multiple specialized agents working together. The SDK's architecture centers around four key technical components: LLMs as Agents (configuring language models with specific instructions and tools), Intelligent Handoffs (managing the transfer of control between different agents), Configurable Guardrails (implementing safety checks), and Tracing & Observability (visualizing execution flow for debugging).
This underscores a strategic move to enable enterprises to construct sophisticated, multi-agent systems tailored to their unique operational needs. The provision of these granular tools suggests a recognition that businesses require flexible components to deeply integrate AI agents within their existing infrastructure and processes.
2. Google's Agent Architecture: Reasoning-First Approach with Vertex AI
Google's recent announcements focused on two complementary technologies: the Gemini 2.5 Pro Experimental model with advanced reasoning capabilities and the general availability of the Vertex AI Agent Engine.
Google's approach reveals a different architectural vision, one centered on reasoning as the foundation of agency. Their Gemini 2.5 Pro model introduces enhanced reasoning modules within the transformer architecture, creating a system that excels not merely at pattern recognition but at logical inference. The model demonstrates state-of-the-art performance on benchmarks for math, science, and code generation.
In typical Gemini fashion, it features a million tokens context window with plans to expand to 2 million. The positive community feedback so far suggests that Google wasn’t wrong when they claimed that Gemini 2.5 Pro was setting new standards for what an agent can comprehend at once. This allows it to process enormous amounts of information in a single prompt, including entire code repositories or comprehensive documentation, which is particularly valuable for agent applications that require understanding large volumes of contextual information.
Their announcement that all future AI models will incorporate built-in reasoning capabilities indicates that they're baking these capabilities into the foundational architecture rather than treating them as optional add-ons. The Vertex AI Agent Engine (previously known as LangChain on Vertex AI) complements this reasoning-first approach by addressing the engineering challenges of building and deploying AI agents at scale.
The technical architecture of the engine focuses on three key capabilities: First, seamless connectivity abstracts away the complexity of integration, providing standardized interfaces for connecting AI models to APIs and databases. Second, built-in Retrieval-Augmented Generation capabilities allow agents to supplement their parametric knowledge with information retrieved from external sources, improving accuracy and reducing hallucinations. Third, Python function integration extends model capabilities using code as intermediaries, enabling actions like database queries, document retrieval, and API interactions.
By offering these components as a cohesive platform rather than separate tools, Google is lowering the technical barriers to building sophisticated agent systems, potentially enabling enterprises with less specialized AI knowledge to create effective agent applications.
3. Approaches to Multi-Agent Orchestration
As we move beyond single-agent paradigms, we encounter what might be called the orchestration dilemma - how to coordinate multiple specialized intelligences into a coherent system greater than the sum of its parts. In the current landscape, there are several orchestration frameworks to highlight, each taking a different technical approach to this fundamental problem:
Keep reading with a 7-day free trial
Subscribe to LLM Watch to keep reading this post and get 7 days of free access to the full post archives.