Nature Machine Intelligence最新文献_第4页

A comprehensive large-scale biomedical knowledge graph for AI-powered data-driven biomedical research 用于人工智能驱动的数据驱动生物医学研究的综合大规模生物医学知识图谱

IF 23.8 1区计算机科学

Nature Machine Intelligence Pub Date : 2025-03-17 DOI: 10.1038/s42256-025-01014-w

Yuan Zhang, Xin Sui, Feng Pan, Kaixian Yu, Keqiao Li, Shubo Tian, Arslan Erdengasileng, Qing Han, Wanjing Wang, Jianan Wang, Jian Wang, Donghu Sun, Henry Chung, Jun Zhou, Eric Zhou, Ben Lee, Peili Zhang, Xing Qiu, Tingting Zhao, Jinfeng Zhang

{"title":"A comprehensive large-scale biomedical knowledge graph for AI-powered data-driven biomedical research","authors":"Yuan Zhang, Xin Sui, Feng Pan, Kaixian Yu, Keqiao Li, Shubo Tian, Arslan Erdengasileng, Qing Han, Wanjing Wang, Jianan Wang, Jian Wang, Donghu Sun, Henry Chung, Jun Zhou, Eric Zhou, Ben Lee, Peili Zhang, Xing Qiu, Tingting Zhao, Jinfeng Zhang","doi":"10.1038/s42256-025-01014-w","DOIUrl":"https://doi.org/10.1038/s42256-025-01014-w","url":null,"abstract":"<p>To address the rapid growth of scientific publications and data in biomedical research, knowledge graphs (KGs) have become a critical tool for integrating large volumes of heterogeneous data to enable efficient information retrieval and automated knowledge discovery. However, transforming unstructured scientific literature into KGs remains a significant challenge, with previous methods unable to achieve human-level accuracy. Here we used an information extraction pipeline that won first place in the LitCoin Natural Language Processing Challenge (2022) to construct a large-scale KG named iKraph using all PubMed abstracts. The extracted information matches human expert annotations and significantly exceeds the content of manually curated public databases. To enhance the KG’s comprehensiveness, we integrated relation data from 40 public databases and relation information inferred from high-throughput genomics data. This KG facilitates rigorous performance evaluation of automated knowledge discovery, which was infeasible in previous studies. We designed an interpretable, probabilistic-based inference method to identify indirect causal relations and applied it to real-time COVID-19 drug repurposing from March 2020 to May 2023. Our method identified around 1,200 candidate drugs in the first 4 months, with one-third of those discovered in the first 2 months later supported by clinical trials or PubMed publications. These outcomes are very challenging to attain through alternative approaches that lack a thorough understanding of the existing literature. A cloud-based platform (https://biokde.insilicom.com) was developed for academic users to access this rich structured data and associated tools.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"28 1","pages":""},"PeriodicalIF":23.8,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143635718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

From data chaos to precision medicine 从数据混乱到精准医疗

IF 18.8 1区计算机科学

Nature Machine Intelligence Pub Date : 2025-03-13 DOI: 10.1038/s42256-025-01015-9

Alexander Schönhuth

引用次数: 0

Transformers and genome language models 变形金刚和基因组语言模型

IF 18.8 1区计算机科学

Nature Machine Intelligence Pub Date : 2025-03-13 DOI: 10.1038/s42256-025-01007-9

Micaela E. Consens, Cameron Dufault, Michael Wainberg, Duncan Forster, Mehran Karimzadeh, Hani Goodarzi, Fabian J. Theis, Alan Moses, Bo Wang

{"title":"Transformers and genome language models","authors":"Micaela E. Consens, Cameron Dufault, Michael Wainberg, Duncan Forster, Mehran Karimzadeh, Hani Goodarzi, Fabian J. Theis, Alan Moses, Bo Wang","doi":"10.1038/s42256-025-01007-9","DOIUrl":"10.1038/s42256-025-01007-9","url":null,"abstract":"Large language models based on the transformer deep learning architecture have revolutionized natural language processing. Motivated by the analogy between human language and the genome’s biological code, researchers have begun to develop genome language models (gLMs) based on transformers and related architectures. This Review explores the use of transformers and language models in genomics. We survey open questions in genomics amenable to the use of gLMs, and motivate the use of gLMs and the transformer architecture for these problems. We discuss the potential of gLMs for modelling the genome using unsupervised pretraining tasks, specifically focusing on the power of zero- and few-shot learning. We explore the strengths and limitations of the transformer architecture, as well as the strengths and limitations of current gLMs more broadly. Additionally, we contemplate the future of genomic modelling beyond the transformer architecture, based on current trends in research. This Review serves as a guide for computational biologists and computer scientists interested in transformers and language models for genomic data. Micaela Consens et al. discuss and review the recent rise of transformer-based and large language models in genomics. They also highlight promising directions for genome language models beyond the transformer architecture.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 3","pages":"346-362"},"PeriodicalIF":18.8,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143608413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast, scale-adaptive and uncertainty-aware downscaling of Earth system model fields with generative machine learning 基于生成式机器学习的地球系统模型场快速、尺度自适应和不确定性感知降尺度

IF 18.8 1区计算机科学

Nature Machine Intelligence Pub Date : 2025-03-13 DOI: 10.1038/s42256-025-00980-5

Philipp Hess, Michael Aich, Baoxiang Pan, Niklas Boers

{"title":"Fast, scale-adaptive and uncertainty-aware downscaling of Earth system model fields with generative machine learning","authors":"Philipp Hess, Michael Aich, Baoxiang Pan, Niklas Boers","doi":"10.1038/s42256-025-00980-5","DOIUrl":"10.1038/s42256-025-00980-5","url":null,"abstract":"Accurate and high-resolution Earth system model (ESM) simulations are essential to assess the ecological and socioeconomic impacts of anthropogenic climate change, but are computationally too expensive to be run at sufficiently high spatial resolution. Recent machine learning approaches have shown promising results in downscaling ESM simulations, outperforming state-of-the-art statistical approaches. However, existing methods require computationally costly retraining for each ESM and extrapolate poorly to climates unseen during training. We address these shortcomings by learning a consistency model that efficiently and accurately downscales arbitrary ESM simulations without retraining in a zero-shot manner. Our approach yields probabilistic downscaled fields at a resolution only limited by the observational reference data. We show that the consistency model outperforms state-of-the-art diffusion models at a fraction of the computational cost and maintains high controllability on the downscaling task. Further, our method generalizes to climate states unseen during training without explicitly formulated physical constraints. A generative machine learning approach is proposed to improve the resolution of Earth system models in an efficient, adaptive and uncertainty-aware manner.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 3","pages":"363-373"},"PeriodicalIF":18.8,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-025-00980-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143608414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data-driven federated learning in drug discovery with knowledge distillation 基于知识蒸馏的药物发现数据驱动的联邦学习

IF 18.8 1区计算机科学

Nature Machine Intelligence Pub Date : 2025-03-05 DOI: 10.1038/s42256-025-00991-2

Thierry Hanser, Ernst Ahlberg, Alexander Amberg, Lennart T. Anger, Chris Barber, Richard J. Brennan, Alessandro Brigo, Annie Delaunois, Susanne Glowienke, Nigel Greene, Laura Johnston, Daniel Kuhn, Lara Kuhnke, Jean-François Marchaland, Wolfgang Muster, Jeffrey Plante, Friedrich Rippmann, Yogesh Sabnis, Friedemann Schmidt, Ruud van Deursen, Stéphane Werner, Angela White, Joerg Wichard, Tomoya Yukawa

{"title":"Data-driven federated learning in drug discovery with knowledge distillation","authors":"Thierry Hanser, Ernst Ahlberg, Alexander Amberg, Lennart T. Anger, Chris Barber, Richard J. Brennan, Alessandro Brigo, Annie Delaunois, Susanne Glowienke, Nigel Greene, Laura Johnston, Daniel Kuhn, Lara Kuhnke, Jean-François Marchaland, Wolfgang Muster, Jeffrey Plante, Friedrich Rippmann, Yogesh Sabnis, Friedemann Schmidt, Ruud van Deursen, Stéphane Werner, Angela White, Joerg Wichard, Tomoya Yukawa","doi":"10.1038/s42256-025-00991-2","DOIUrl":"10.1038/s42256-025-00991-2","url":null,"abstract":"A main challenge for artificial intelligence in scientific research is ensuring access to sufficient, high-quality data for the development of impactful models. Despite the abundance of public data, the most valuable knowledge often remains embedded within confidential corporate data silos. Although industries are increasingly open to sharing non-competitive insights, such collaboration is often constrained by the confidentiality of the underlying data. Federated learning makes it possible to share knowledge without compromising data privacy, but it has notable limitations. Here, we introduce FLuID (federated learning using information distillation), a data-centric application of federated distillation tailored to drug discovery aiming to preserve data privacy. We validate FLuID in two experiments, first involving public data simulating a virtual consortium and second in a real-world research collaboration between eight pharmaceutical companies. Although the alignment of the models with the partner specific domain remains challenging, the data-driven nature of FLuID offers several avenues to mitigate domain shift. FLuID fosters knowledge sharing among pharmaceutical organizations, paving the way for a new generation of models with enhanced performance and an expanded applicability domain in biological activity predictions. FLuID enables privacy-preserving knowledge sharing in drug discovery using knowledge distillation. The results show that the approach expands applicability domains and fosters collaboration across organizations without compromising data privacy or security.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 3","pages":"423-436"},"PeriodicalIF":18.8,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143546524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bridging the gap between machine confidence and human perceptions 弥合机器信心和人类感知之间的差距

IF 18.8 1区计算机科学

Nature Machine Intelligence Pub Date : 2025-03-03 DOI: 10.1038/s42256-025-01013-x

Ming Yin

引用次数: 0

A unified deep framework for peptide–major histocompatibility complex–T cell receptor binding prediction 肽-主要组织相容性复合物- t细胞受体结合预测的统一深度框架

IF 23.8 1区计算机科学

Nature Machine Intelligence Pub Date : 2025-02-26 DOI: 10.1038/s42256-025-01002-0

Yunxiang Zhao, Jijun Yu, Yixin Su, You Shu, Enhao Ma, Jing Wang, Shuyang Jiang, Congwen Wei, Dongsheng Li, Zhen Huang, Gong Cheng, Hongguang Ren, Jiannan Feng

{"title":"A unified deep framework for peptide–major histocompatibility complex–T cell receptor binding prediction","authors":"Yunxiang Zhao, Jijun Yu, Yixin Su, You Shu, Enhao Ma, Jing Wang, Shuyang Jiang, Congwen Wei, Dongsheng Li, Zhen Huang, Gong Cheng, Hongguang Ren, Jiannan Feng","doi":"10.1038/s42256-025-01002-0","DOIUrl":"https://doi.org/10.1038/s42256-025-01002-0","url":null,"abstract":"<p>Antigen peptides that are presented by a major histocompatibility complex (MHC) and recognized by a T cell receptor (TCR) have an essential role in immunotherapy. Although substantial progress has been made in predicting MHC presentation, accurately predicting the binding interactions between antigen peptides, MHCs and TCRs remains a major computational challenge. In this paper, we propose a unified deep framework (called UniPMT) for peptide, MHC and TCR binding prediction to predict the binding between the peptide and the CDR3 of TCR β in general, presented by class I MHCs. UniPMT is comprehensively validated by a series of experiments and achieved state-of-the-art performance in the peptide–MHC–TCR, peptide–MHC and peptide–TCR binding prediction tasks with up to 15% improvements in area under the precision–recall curve taking the peptide–MHC–TCR binding prediction task as an example. In practical applications, UniPMT shows strong predictive power, correlates well with T cell clonal expansion and outperforms existing methods in neoantigen-specific binding prediction with up to 17.62% improvements in area under the precision–recall curve on experimentally validated datasets. Moreover, UniPMT provides interpretable insights into the identification of key binding sites and the quantification of peptide–MHC–TCR binding probabilities. In summary, UniPMT shows great potential to serve as a useful tool for antigen peptide discovery, disease immunotherapy and neoantigen vaccine design.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"51 1","pages":""},"PeriodicalIF":23.8,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143495337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Large language models for scientific discovery in molecular property prediction 用于分子性质预测科学发现的大型语言模型

IF 18.8 1区计算机科学

Nature Machine Intelligence Pub Date : 2025-02-25 DOI: 10.1038/s42256-025-00994-z

Yizhen Zheng, Huan Yee Koh, Jiaxin Ju, Anh T. N. Nguyen, Lauren T. May, Geoffrey I. Webb, Shirui Pan

{"title":"Large language models for scientific discovery in molecular property prediction","authors":"Yizhen Zheng, Huan Yee Koh, Jiaxin Ju, Anh T. N. Nguyen, Lauren T. May, Geoffrey I. Webb, Shirui Pan","doi":"10.1038/s42256-025-00994-z","DOIUrl":"10.1038/s42256-025-00994-z","url":null,"abstract":"Large language models (LLMs) are a form of artificial intelligence system encapsulating vast knowledge in the form of natural language. These systems are adept at numerous complex tasks including creative writing, storytelling, translation, question-answering, summarization and computer code generation. Although LLMs have seen initial applications in natural sciences, their potential for driving scientific discovery remains largely unexplored. In this work, we introduce LLM4SD, a framework designed to harness LLMs for driving scientific discovery in molecular property prediction by synthesizing knowledge from literature and inferring knowledge from scientific data. LLMs synthesize knowledge by extracting established information from scientific literature, such as molecular weight being key to predicting solubility. For inference, LLMs identify patterns in molecular data, particularly in Simplified Molecular Input Line Entry System-encoded structures, such as halogen-containing molecules being more likely to cross the blood–brain barrier. This information is presented as interpretable knowledge, enabling the transformation of molecules into feature vectors. By using these features with interpretable models such as random forest, LLM4SD can outperform the current state of the art across a range of benchmark tasks for predicting molecular properties. We foresee it providing interpretable and potentially new insights, aiding scientific discovery in molecular property prediction. Zheng et al. developed LLM4SD, a framework using large language models to predict molecular properties. The method leverages the ability of large language models to synthesize knowledge from literature and to reason about scientific data with domain expertise.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 3","pages":"437-447"},"PeriodicalIF":18.8,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143486029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Teaching robots to build simulations of themselves 教机器人模拟自己

IF 18.8 1区计算机科学

Nature Machine Intelligence Pub Date : 2025-02-25 DOI: 10.1038/s42256-025-01006-w

Yuhang Hu, Jiong Lin, Hod Lipson

{"title":"Teaching robots to build simulations of themselves","authors":"Yuhang Hu, Jiong Lin, Hod Lipson","doi":"10.1038/s42256-025-01006-w","DOIUrl":"10.1038/s42256-025-01006-w","url":null,"abstract":"The emergence of vision catalysed a pivotal evolutionary advancement, enabling organisms not only to perceive but also to interact intelligently with their environment. This transformation is mirrored by the evolution of robotic systems, where the ability to leverage vision to simulate and predict their own dynamics marks a leap towards autonomy and self-awareness. Humans utilize vision to record experiences and internally simulate potential actions. For example, we can imagine that, if we stand up and raise our arms, the body will form a ‘T’ shape without physical movement. Similarly, simulation allows robots to plan and predict the outcomes of potential actions without execution. Here we introduce a self-supervised learning framework to enable robots to model and predict their morphology, kinematics and motor control using only brief raw video data, eliminating the need for extensive real-world data collection and kinematic priors. By observing their own movements, akin to humans watching their reflection in a mirror, robots learn an ability to simulate themselves and predict their spatial motion for various tasks. Our results demonstrate that this self-learned simulation not only enables accurate motion planning but also allows the robot to detect abnormalities and recover from damage. Motion planning for a robot generally requires full knowledge of its structure. Here Hu and colleagues present a method for inferring the structure of a robot from visual information.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 3","pages":"484-494"},"PeriodicalIF":18.8,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143486028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Seeking visions for sustainable AI 寻求可持续人工智能的愿景

IF 18.8 1区计算机科学

Nature Machine Intelligence Pub Date : 2025-02-24 DOI: 10.1038/s42256-025-01008-8

引用次数: 0