In this issue:
AI agents in a few lines of code
An introduction to Graph Neural Networks
LLMs for complex medical reasoning
1. π€ smolagents - a smol library to build great agents!
What problem does it solve? Building agents that can interact with the world and perform tasks has been a long-standing goal in AI research. However, creating such agents often requires complex architectures and a lot of code. smolagents aims to simplify the process of building agents by providing a lightweight and easy-to-use framework. It focuses on supporting code agents, which write their actions in code, and provides seamless integration with the Hugging Face Hub for sharing and loading tools.
How does it solve the problem? smolagents takes a minimalistic approach to building agents. The core logic for agents is condensed into approximately a thousand lines of code, keeping abstractions to a minimum. It provides first-class support for code agents through the CodeAgent class, which allows agents to write their actions in code. To ensure security, smolagents supports executing code in sandboxed environments using E2B. Additionally, it offers the ToolCallingAgent class for agents that write actions as JSON or text blobs. The framework integrates with the Hugging Face Hub, enabling users to easily share and load tools. It also supports a wide range of LLMs, including models hosted on the Hub, as well as models from OpenAI, Anthropic, and others via the LiteLLM integration.
What's next? smolagents is positioned as the successor to transformers.agents and will eventually replace it. As the framework matures, we can expect further enhancements and integrations with the Hugging Face ecosystem. The simplicity and flexibility of smolagents make it an attractive choice for researchers and developers working on building agents. It will be interesting to see how the community adopts and extends smolagents to create innovative and powerful agents.
2. Introduction to Graph Neural Networks: A Starting Point for Machine Learning Engineers
Watching: GNNs (paper)
What problem does it solve? Graph-structured data is ubiquitous across many domains, from social networks and biological networks to knowledge graphs and recommender systems. Traditional deep learning approaches, designed for grid-like data such as images or sequences, often struggle to effectively capture the complex relationships and dependencies present in graphs. Graph Neural Networks (GNNs) have emerged as a powerful framework to address this challenge by directly operating on graph-structured data and learning meaningful representations of nodes, edges, and entire graphs.
How does it solve the problem? GNNs solve the problem of learning on graph-structured data by using an encoder-decoder framework. The encoder component of a GNN learns to map the input graph, along with its node and edge features, into a low-dimensional vector representation. This is typically achieved through message passing, where nodes iteratively update their representations by aggregating information from their neighbors. The decoder component then takes these learned representations and uses them for various graph analytic tasks, such as node classification, link prediction, or graph classification. By jointly learning the encoder and decoder, GNNs can automatically extract relevant features from the graph structure and adapt to different tasks.
What's next? Despite the impressive progress made by GNNs, there are still several exciting research directions to explore. One area of interest is developing more expressive and powerful GNN architectures that can capture higher-order interactions and long-range dependencies in graphs. Another important challenge is improving the scalability and efficiency of GNNs to handle large-scale graphs with millions or billions of nodes and edges. Additionally, there is a need for more interpretable and explainable GNNs that can provide insights into the learned representations and decision-making process.
3. HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Watching: HuatuoGPT-o1 (paper/code)
What problem does it solve? While large language models (LLMs) have shown impressive performance in various domains, their reasoning capabilities in specialized fields like medicine remain underexplored. Medical reasoning poses unique challenges compared to mathematical reasoning, as it requires not only robust logical thinking but also adherence to the high standards of healthcare. Verifying the correctness of medical reasoning is crucial, yet it is more complex than verifying mathematical proofs. This study addresses the need for enhancing medical reasoning in LLMs while ensuring the reliability of their outputs.
How does it solve the problem? The researchers propose a two-stage approach to improve medical reasoning in LLMs. First, they introduce verifiable medical problems along with a medical verifier that can check the correctness of the model's outputs. This verifiable nature of the problems allows for the creation of a complex reasoning trajectory, which is used to fine-tune the LLMs. Second, they apply reinforcement learning (RL) with rewards based on the medical verifier to further enhance the model's complex reasoning capabilities. By combining these two stages, the researchers develop HuatuoGPT-o1, a medical LLM that demonstrates superior performance in medical problem-solving compared to general and medical-specific baselines, using only 40,000 verifiable problems.
What's next? The approach presented in this study can serve as a blueprint for enhancing reasoning in other fields, such as law, finance, or engineering, where reliable and verifiable reasoning is crucial. Furthermore, the integration of domain-specific verifiers and reinforcement learning could lead to the development of more specialized LLMs that excel in complex reasoning tasks within their respective fields. As the demand for reliable and accurate AI assistance grows across various industries, research like this paves the way for creating LLMs that can potentially provide more trustworthy and well-reasoned solutions to domain-specific problems.
Papers of the Week:
CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era
Toward Adaptive Reasoning in Large Language Models with Thought Rollback
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Aviary: training language agents on challenging scientific tasks
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings