{"title":"A predictive machine learning force-field framework for liquid electrolyte development","authors":"Sheng Gong, Yumin Zhang, Zhenliang Mu, Zhichen Pu, Hongyi Wang, Xu Han, Zhiao Yu, Mengyi Chen, Tianze Zheng, Zhi Wang, Lifei Chen, Zhenze Yang, Xiaojie Wu, Shaochen Shi, Weihao Gao, Wen Yan, Liang Xiang","doi":"10.1038/s42256-025-01009-7","DOIUrl":"https://doi.org/10.1038/s42256-025-01009-7","url":null,"abstract":"<p>Despite the widespread applications of machine learning force fields (MLFFs) in solids and small molecules, there is a notable gap in applying MLFFs to simulate liquid electrolytes—a critical component of current commercial lithium-ion batteries. Here we introduce ByteDance Artificial intelligence Molecular simulation Booster (BAMBOO), a predictive framework for molecular dynamics simulations, with a demonstration of its capability in the context of liquid electrolytes for lithium batteries. We design a physics-inspired graph equivariant transformer architecture as the backbone of BAMBOO to learn from quantum mechanical simulations. Additionally, we introduce an ensemble knowledge distillation approach and apply it to MLFFs to reduce the fluctuation of observations from molecular dynamics simulations. Finally, we propose a density alignment algorithm to align BAMBOO with experimental measurements. BAMBOO demonstrates state-of-the-art accuracy in predicting key electrolyte properties such as density, viscosity and ionic conductivity across various solvents and salt combinations. The current model, trained on more than 15 chemical species, achieves an average density error of 0.01 g cm<sup>−</sup><sup>3</sup> on various compositions compared with experiment.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"55 1","pages":""},"PeriodicalIF":23.8,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143744769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kevin Eloff, Konstantinos Kalogeropoulos, Amandla Mabona, Oliver Morell, Rachel Catzel, Esperanza Rivera-de-Torre, Jakob Berg Jespersen, Wesley Williams, Sam P. B. van Beljouw, Marcin J. Skwark, Andreas Hougaard Laustsen, Stan J. J. Brouns, Anne Ljungars, Erwin M. Schoof, Jeroen Van Goey, Ulrich auf dem Keller, Karim Beguir, Nicolas Lopez Carranza, Timothy P. Jenkins
{"title":"InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale proteomics experiments","authors":"Kevin Eloff, Konstantinos Kalogeropoulos, Amandla Mabona, Oliver Morell, Rachel Catzel, Esperanza Rivera-de-Torre, Jakob Berg Jespersen, Wesley Williams, Sam P. B. van Beljouw, Marcin J. Skwark, Andreas Hougaard Laustsen, Stan J. J. Brouns, Anne Ljungars, Erwin M. Schoof, Jeroen Van Goey, Ulrich auf dem Keller, Karim Beguir, Nicolas Lopez Carranza, Timothy P. Jenkins","doi":"10.1038/s42256-025-01019-5","DOIUrl":"https://doi.org/10.1038/s42256-025-01019-5","url":null,"abstract":"<p>Mass spectrometry-based proteomics focuses on identifying the peptide that generates a tandem mass spectrum. Traditional methods rely on protein databases but are often limited or inapplicable in certain contexts. De novo peptide sequencing, which assigns peptide sequences to spectra without prior information, is valuable for diverse biological applications; however, owing to a lack of accuracy, it remains challenging to apply. Here we introduce InstaNovo, a transformer model that translates fragment ion peaks into peptide sequences. We demonstrate that InstaNovo outperforms state-of-the-art methods and showcase its utility in several applications. We also introduce InstaNovo+, a diffusion model that improves performance through iterative refinement of predicted sequences. Using these models, we achieve improved therapeutic sequencing coverage, discover novel peptides and detect unreported organisms in diverse datasets, thereby expanding the scope and detection rate of proteomics searches. Our models unlock opportunities across domains such as direct protein sequencing, immunopeptidomics and exploration of the dark proteome.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"69 1","pages":""},"PeriodicalIF":23.8,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143736615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A text-guided protein design framework","authors":"Shengchao Liu, Yanjing Li, Zhuoxinran Li, Anthony Gitter, Yutao Zhu, Jiarui Lu, Zhao Xu, Weili Nie, Arvind Ramanathan, Chaowei Xiao, Jian Tang, Hongyu Guo, Anima Anandkumar","doi":"10.1038/s42256-025-01011-z","DOIUrl":"https://doi.org/10.1038/s42256-025-01011-z","url":null,"abstract":"<p>Current AI-assisted protein design utilizes mainly protein sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in text format describing proteins’ high-level functionalities, yet whether the incorporation of such text data can help in protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multimodal framework that leverages textual descriptions for protein design. ProteinDT consists of three consecutive steps: ProteinCLAP, which aligns the representation of two modalities, a facilitator that generates the protein representation from the text modality and a decoder that creates the protein sequences from the representation. To train ProteinDT, we construct a large dataset, SwissProtCLAP, with 441,000 text and protein pairs. We quantitatively verify the effectiveness of ProteinDT on three challenging tasks: (1) over 90% accuracy for text-guided protein generation; (2) best hit ratio on 12 zero-shot text-guided protein editing tasks; (3) superior performance on four out of six protein property prediction benchmarks.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"35 1","pages":""},"PeriodicalIF":23.8,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143712763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A disease-specific language model for variant pathogenicity in cardiac and regulatory genomics","authors":"Huixin Zhan, Jason H. Moore, Zijun Zhang","doi":"10.1038/s42256-025-01016-8","DOIUrl":"https://doi.org/10.1038/s42256-025-01016-8","url":null,"abstract":"<p>Clinical variant classification of pathogenic versus benign genetic variants remains a challenge in genetics. Current genomic foundation models have enhanced variant effect prediction (VEP) accuracy through weakly supervised or unsupervised training, yet these models lack disease specificity. Here, to address this, we propose DYNA (disease-specificity fine-tuning via a Siamese neural network), broadly applicable to all genomic foundation models for more effective VEPs in disease contexts. We applied DYNA to the coding VEP in cardiovascular diseases and the non-coding VEP of RNA splicing regulation. These two tasks cover a wide range of specific disease–gene relationships and disease-causing regulatory mechanisms; therefore, their performance will inform the general utility of DYNA. In both cases, DYNA fine-tunes various pretrained genomic foundation models on small rare-variant sets. The DYNA fine-tuned models show superior performance in held-out rare-variant test sets and are further replicated in large, clinically relevant variant annotations in ClinVar. Importantly, we observed that different genomic foundation models excel at different downstream VEP tasks, necessitating a universal tool such as DYNA to fully harness the power of genomic foundation models. Thus, DYNA offers a potent disease-specific VEP method for clinical variant interpretation.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"59 1","pages":""},"PeriodicalIF":23.8,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143678046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transparency (in training data) is what we want","authors":"","doi":"10.1038/s42256-025-01023-9","DOIUrl":"10.1038/s42256-025-01023-9","url":null,"abstract":"As more powerful generative AI tools appear on the market, legal debates about the use of copyrighted content to develop such tools are intensifying. To resolve these issues, transparency regarding which copyrighted data have been used and where in the AI training pipeline needs to be a starting point.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 3","pages":"329-329"},"PeriodicalIF":18.8,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-025-01023-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143690303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mona Sloane, Emanuel Moss, Susan Kennedy, Matthew Stewart, Pete Warden, Brian Plancher, Vijay Janapa Reddi
{"title":"Materiality and risk in the age of pervasive AI sensors","authors":"Mona Sloane, Emanuel Moss, Susan Kennedy, Matthew Stewart, Pete Warden, Brian Plancher, Vijay Janapa Reddi","doi":"10.1038/s42256-025-01017-7","DOIUrl":"10.1038/s42256-025-01017-7","url":null,"abstract":"Artificial intelligence (AI) systems connected to sensor-laden devices are becoming pervasive, which has notable implications for a range of AI risks, including to privacy, the environment, autonomy and more. There is therefore a growing need for increased accountability around the responsible development and deployment of these technologies. Here we highlight the dimensions of risk associated with AI systems that arise from the material affordances of sensors and their underlying calculative models. We propose a sensor-sensitive framework for diagnosing these risks, complementing existing approaches such as the US National Institute of Standards and Technology AI Risk Management Framework and the European Union AI Act, and discuss its implementation. We conclude by advocating for increased attention to the materiality of algorithmic systems, and of on-device AI sensors in particular, and highlight the need for development of a sensor design paradigm that empowers users and communities and leads to a future of increased fairness, accountability and transparency. Sloane and colleagues review emerging new dimensions of risks associated with materiality and AI algorithms run on pervasive sensors.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 3","pages":"334-345"},"PeriodicalIF":18.8,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143661528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francisco J. R. Ruiz, Tuomas Laakkonen, Johannes Bausch, Matej Balog, Mohammadamin Barekatain, Francisco J. H. Heras, Alexander Novikov, Nathan Fitzpatrick, Bernardino Romera-Paredes, John van de Wetering, Alhussein Fawzi, Konstantinos Meichanetzidis, Pushmeet Kohli
{"title":"Quantum circuit optimization with AlphaTensor","authors":"Francisco J. R. Ruiz, Tuomas Laakkonen, Johannes Bausch, Matej Balog, Mohammadamin Barekatain, Francisco J. H. Heras, Alexander Novikov, Nathan Fitzpatrick, Bernardino Romera-Paredes, John van de Wetering, Alhussein Fawzi, Konstantinos Meichanetzidis, Pushmeet Kohli","doi":"10.1038/s42256-025-01001-1","DOIUrl":"10.1038/s42256-025-01001-1","url":null,"abstract":"A key challenge in realizing fault-tolerant quantum computers is circuit optimization. Focusing on the most expensive gates in fault-tolerant quantum computation (namely, the T gates), we address the problem of T-count optimization, that is, minimizing the number of T gates needed to implement a given circuit. To achieve this, we develop AlphaTensor-Quantum, a method based on deep reinforcement learning that exploits the relationship between optimizing the T-count and tensor decomposition. Unlike existing methods for T-count optimization, AlphaTensor-Quantum can incorporate domain-specific knowledge about quantum computation and leverage gadgets, which substantially reduces the T-count of the optimized circuits. AlphaTensor-Quantum outperforms the existing methods for T-count optimization on a set of arithmetic benchmarks (even when compared without using gadgets). Remarkably, it discovers an efficient algorithm akin to Karatsuba’s method for multiplication in finite fields. AlphaTensor-Quantum also finds the best human-designed solutions for relevant arithmetic computations used in Shor’s algorithm and for quantum chemistry simulation, thus demonstrating that it can save hundreds of hours of research by optimizing relevant quantum circuits in a fully automated way. Ruiz and colleagues introduce AlphaTensor-Quantum, a deep reinforcement learning method for optimizing quantum circuits. It outperforms existing methods and is capable of finding the best human-designed solutions for relevant quantum computations in a fully automated way.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 3","pages":"374-385"},"PeriodicalIF":18.8,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-025-01001-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143661217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Active exploration and reconstruction of vascular networks using microrobot swarms","authors":"Xingzhou Du, Yibin Wang, Junhui Law, Kaiwen Fang, Hui Chen, Yuezhen Liu, Jiangfan Yu","doi":"10.1038/s42256-025-01012-y","DOIUrl":"https://doi.org/10.1038/s42256-025-01012-y","url":null,"abstract":"<p>Angiography is essential in interventional operations to image the vascular network. Passive contrast agents applied in angiography highly rely on the flow direction, making the imaging of upstream regions and embolic branches challenging. Active imaging is demanded for the accurate localization of blockages and lesions in vascular networks. Here an active exploration and reconstruction strategy is proposed, enabling full imaging of three-dimensional (3D) vascular networks with flow and blockage. The strategy implements magnetic particle swarms as active agents, which can be guided on demand towards the desired directions. An image processing unit is developed to capture the 3D position of the swarm inside the vessel. A simultaneous mapping and exploration sequence is proposed to realize the exploration, and the entire structure of the 3D vascular network is reconstructed after obtaining the position data. The proposed strategy is validated in vascular networks with different structures and conditions, and it enables the thorough exploration and reconstruction of regions that cannot be accessed by passive contrast agents. This strategy is promising in locating stenoses, thrombi and fistulae in vascular systems.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 1","pages":""},"PeriodicalIF":23.8,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143653956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruaridh Mon-Williams, Gen Li, Ran Long, Wenqian Du, Christopher G. Lucas
{"title":"Embodied large language models enable robots to complete complex tasks in unpredictable environments","authors":"Ruaridh Mon-Williams, Gen Li, Ran Long, Wenqian Du, Christopher G. Lucas","doi":"10.1038/s42256-025-01005-x","DOIUrl":"https://doi.org/10.1038/s42256-025-01005-x","url":null,"abstract":"<p>Completing complex tasks in unpredictable settings challenges robotic systems, requiring a step change in machine intelligence. Sensorimotor abilities are considered integral to human intelligence. Thus, biologically inspired machine intelligence might usefully combine artificial intelligence with robotic sensorimotor capabilities. Here we report an embodied large-language-model-enabled robot (ELLMER) framework, utilizing GPT-4 and a retrieval-augmented generation infrastructure, to enable robots to complete long-horizon tasks in unpredictable settings. The method extracts contextually relevant examples from a knowledge base, producing action plans that incorporate force and visual feedback and enabling adaptation to changing conditions. We tested ELLMER on a robot tasked with coffee making and plate decoration; these tasks consist of a sequence of sub-tasks from drawer opening to pouring, each benefiting from distinct feedback types and methods. We show that the ELLMER framework allows the robot to complete the tasks. This demonstration marks progress towards scalable, efficient and ‘intelligent robots’ able to complete complex tasks in uncertain environments.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"28 1","pages":""},"PeriodicalIF":23.8,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143653955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jacob Kauffmann, Jonas Dippel, Lukas Ruff, Wojciech Samek, Klaus-Robert Müller, Grégoire Montavon
{"title":"Explainable AI reveals Clever Hans effects in unsupervised learning models","authors":"Jacob Kauffmann, Jonas Dippel, Lukas Ruff, Wojciech Samek, Klaus-Robert Müller, Grégoire Montavon","doi":"10.1038/s42256-025-01000-2","DOIUrl":"10.1038/s42256-025-01000-2","url":null,"abstract":"Unsupervised learning has become an essential building block of artifical intelligence systems. The representations it produces, for example, in foundation models, are critical to a wide variety of downstream applications. It is therefore important to carefully examine unsupervised models to ensure not only that they produce accurate predictions on the available data but also that these accurate predictions do not arise from a Clever Hans (CH) effect. Here, using specially developed explainable artifical intelligence techniques and applying them to popular representation learning and anomaly detection models for image data, we show that CH effects are widespread in unsupervised learning. In particular, through use cases on medical and industrial inspection data, we demonstrate that CH effects systematically lead to significant performance loss of downstream models under plausible dataset shifts or reweighting of different data subgroups. Our empirical findings are enriched by theoretical insights, which point to inductive biases in the unsupervised learning machine as a primary source of CH effects. Overall, our work sheds light on unexplored risks associated with practical applications of unsupervised learning and suggests ways to systematically mitigate CH effects, thereby making unsupervised learning more robust. Building on recent explainable AI techniques, this Article highlights the pervasiveness of Clever Hans effects in unsupervised learning and the substantial risks associated with these effects in terms of the prediction accuracy on new data.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 3","pages":"412-422"},"PeriodicalIF":18.8,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-025-01000-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143635720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}