In this issue:
AlphaFold: third time’s the charm
xLSTM: bringing back a classic
Evaluating LLMs for evaluating LLMs
1. Accurate structure prediction of biomolecular interactions with AlphaFold 3
Watching: AlphaFold 3 (paper)
What problem does it solve? AlphaFold 2 revolutionized the field of protein structure prediction, enabling highly accurate modeling of individual proteins and protein complexes. However, proteins often interact with other types of molecules, such as nucleic acids (DNA and RNA), small molecules (ligands), ions, and modified residues. Accurately predicting these interactions is crucial for understanding biological processes and designing new drugs. AlphaFold 3 addresses this challenge by extending the capabilities of the model to jointly predict the structure of complexes involving proteins and these other types of molecules.
How does it solve the problem? AlphaFold 3 introduces a substantially updated architecture based on diffusion models. Diffusion models have shown impressive results in generating high-quality images and have recently been applied to protein structure prediction. By leveraging this powerful framework, AlphaFold 3 can model the interactions between proteins and various other molecules within a single unified deep learning system. The model demonstrates significantly improved accuracy compared to previous specialized tools, such as state-of-the-art docking tools for protein-ligand interactions, nucleic-acid-specific predictors for protein-nucleic acid interactions, and AlphaFold-Multimer v2.3 for antibody-antigen prediction.
What's next? AlphaFold 3 could be used to identify new drug targets, design more effective drugs, and gain a deeper understanding of complex biological processes. As the model continues to improve and incorporate additional types of molecules, it may become an essential tool for researchers across various fields, from structural biology to pharmacology. Furthermore, the success of AlphaFold 3 demonstrates the potential of diffusion-based models in solving complex scientific problems, which may inspire further advancements in machine learning and its applications in the life sciences.
2. xLSTM: Extended Long Short-Term Memory
Watching: xLSTM (paper)
What problem does it solve? While the Transformer architecture has become the de-facto standard for Large Language Models (LLMs) in recent years, it's important to remember that LSTMs were the original building blocks of early LLMs. The main advantage of Transformers over LSTMs is their ability to parallelize computations, particularly through the self-attention mechanism. However, LSTMs have certain desirable properties, such as their ability to capture long-term dependencies, which raises the question: can we scale LSTMs to billions of parameters while leveraging modern techniques to mitigate their limitations and make them competitive with Transformers?
How does it solve the problem? The researchers introduce several modifications to the standard LSTM architecture to create xLSTM. Firstly, they introduce exponential gating with normalization and stabilization techniques to improve the flow of information through the network. Secondly, they modify the LSTM memory structure in two ways: (i) sLSTM uses a scalar memory and update with new memory mixing, and (ii) mLSTM is fully parallelizable with a matrix memory and covariance update rule. These LSTM extensions are then integrated into residual block backbones and stacked to form the xLSTM architecture. The combination of exponential gating and modified memory structures enhances the capabilities of xLSTMs, allowing them to perform favorably compared to state-of-the-art Transformers and State Space Models.
What's next? The xLSTM architecture demonstrates that with appropriate modifications and scaling, LSTMs can still be competitive with modern Transformer-based models. This opens up new avenues for research into the potential of LSTMs and other recurrent architectures in the context of LLMs. It will be interesting to see if xLSTMs can be further improved and if they can be applied to a wider range of tasks beyond language modeling. Additionally, the techniques introduced in this paper, such as exponential gating and modified memory structures, could potentially be adapted to other architectures to enhance their performance and scaling properties.
3. Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Watching: Prometheus 2 (paper/code)
What problem does it solve? Evaluating the quality of outputs from Large Language Models (LLMs) is a challenging task. While proprietary models like GPT-4 are often used as a benchmark, they come with limitations in terms of transparency, control, and cost. Existing open-source evaluation models, on the other hand, have their own shortcomings. They often produce scores that differ significantly from human judgments and lack the flexibility to perform both direct assessment and pairwise ranking. Moreover, they are limited to evaluating general attributes and cannot handle custom evaluation criteria.
How does it solve the problem? Prometheus 2 is designed to address the limitations of existing open-source evaluator LMs. It achieves a closer alignment with human and GPT-4 judgments, providing more reliable and consistent evaluation scores. Additionally, Prometheus 2 offers greater flexibility by supporting both direct assessment and pairwise ranking formats. This allows users to choose the evaluation approach that best suits their needs. Furthermore, Prometheus 2 introduces the ability to evaluate based on user-defined criteria, enabling customized assessments beyond general attributes like helpfulness and harmlessness.
What's next? With its improved alignment with human judgments and enhanced flexibility, Prometheus 2 has the potential to become a valuable tool for researchers and practitioners in the field. Future work could focus on expanding its capabilities to handle a wider range of evaluation tasks, and exploring its application in real-world scenarios. Additionally, the open-source nature of Prometheus 2 encourages collaboration and contributions from the community, fostering the development of even more advanced and reliable evaluation models.
Papers of the Week:
Conformal Prediction for Natural Language Processing: A Survey
When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards
Efficient and Economic Large Language Model Inference with Attention Offloading
Large Language Models for Cyber Security: A Systematic Literature Review
Critical Infrastructure Protection: Generative AI, Challenges, and Opportunities