Nature MethodsPub Date : 2025-08-04DOI: 10.1038/s41592-025-02772-6
Constantin Ahlmann-Eltze, Wolfgang Huber, Simon Anders
{"title":"Deep-learning-based gene perturbation effect prediction does not yet outperform simple linear baselines","authors":"Constantin Ahlmann-Eltze, Wolfgang Huber, Simon Anders","doi":"10.1038/s41592-025-02772-6","DOIUrl":"10.1038/s41592-025-02772-6","url":null,"abstract":"Recent research in deep-learning-based foundation models promises to learn representations of single-cell data that enable prediction of the effects of genetic perturbations. Here we compared five foundation models and two other deep learning models against deliberately simple baselines for predicting transcriptome changes after single or double perturbations. None outperformed the baselines, which highlights the importance of critical benchmarking in directing and evaluating method development. The analysis presented in this Brief Communication shows that, despite their complexity, current deep learning models do not outperform linear baselines in predicting gene perturbation effects, thus emphasizing the importance of further method development and thorough evaluation.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":"22 8","pages":"1657-1661"},"PeriodicalIF":32.1,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12328236/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144784796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nature MethodsPub Date : 2025-08-04DOI: 10.1038/s41592-025-02741-z
Benedict Wolf, Pegi Shehu, Luca Brenker, Anna-Lisa von Bachmann, Ann-Sophie Kroell, Nicholas Southern, Stefan Holderbach, Joshua Eigenmann, Sabine Aschenbrenner, Jan Mathony, Dominik Niopek
{"title":"Rational engineering of allosteric protein switches by in silico prediction of domain insertion sites","authors":"Benedict Wolf, Pegi Shehu, Luca Brenker, Anna-Lisa von Bachmann, Ann-Sophie Kroell, Nicholas Southern, Stefan Holderbach, Joshua Eigenmann, Sabine Aschenbrenner, Jan Mathony, Dominik Niopek","doi":"10.1038/s41592-025-02741-z","DOIUrl":"10.1038/s41592-025-02741-z","url":null,"abstract":"Domain insertion engineering is a powerful approach to juxtapose otherwise separate biological functions, resulting in proteins with new-to-nature activities. A prominent example are switchable protein variants, created by receptor domain insertion into effector proteins. Identifying suitable, allosteric sites for domain insertion, however, typically requires extensive screening and optimization. We present ProDomino, a machine learning pipeline to rationalize domain recombination, trained on a semisynthetic protein sequence dataset derived from naturally occurring intradomain insertion events. ProDomino robustly identifies domain insertion sites in proteins of biotechnological relevance, which we experimentally validated in Escherichia coli and human cells. Finally, we used light- and chemically regulated receptor domains as inserts and demonstrate the rapid, model-guided creation of potent, single-component opto- and chemogenetic protein switches. These include novel CRISPR–Cas9 and –Cas12a variants for inducible genome engineering in human cells. Our work enables one-shot domain insertion engineering and substantially accelerates the design of customized allosteric proteins. ProDomino is a machine leaning-based method, trained on a semisynthetic domain insertion dataset, to guide the engineering of protein domain recombination.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":"22 8","pages":"1698-1706"},"PeriodicalIF":32.1,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12328240/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144784798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nature MethodsPub Date : 2025-08-04DOI: 10.1038/s41592-025-02750-y
Zev Kronenberg, Cillian Nolan, David Porubsky, Tom Mokveld, William J. Rowell, Sangjin Lee, Egor Dolzhenko, Pi-Chuan Chang, James M. Holt, Christopher T. Saunders, Nathan D. Olson, Cody J. Steely, Sean McGee, Andrea Guarracino, Nidhi Koundinya, William T. Harvey, W. Scott Watkins, Katherine M. Munson, Kendra Hoekzema, Khi Pin Chua, Xiao Chen, Cairbre Fanslow, Christine Lambert, Harriet Dashnow, Erik Garrison, Joshua D. Smith, Peter M. Lansdorp, Justin M. Zook, Andrew Carroll, Lynn B. Jorde, Deborah W. Neklason, Aaron R. Quinlan, Evan E. Eichler, Michael A. Eberle
{"title":"The Platinum Pedigree: a long-read benchmark for genetic variants","authors":"Zev Kronenberg, Cillian Nolan, David Porubsky, Tom Mokveld, William J. Rowell, Sangjin Lee, Egor Dolzhenko, Pi-Chuan Chang, James M. Holt, Christopher T. Saunders, Nathan D. Olson, Cody J. Steely, Sean McGee, Andrea Guarracino, Nidhi Koundinya, William T. Harvey, W. Scott Watkins, Katherine M. Munson, Kendra Hoekzema, Khi Pin Chua, Xiao Chen, Cairbre Fanslow, Christine Lambert, Harriet Dashnow, Erik Garrison, Joshua D. Smith, Peter M. Lansdorp, Justin M. Zook, Andrew Carroll, Lynn B. Jorde, Deborah W. Neklason, Aaron R. Quinlan, Evan E. Eichler, Michael A. Eberle","doi":"10.1038/s41592-025-02750-y","DOIUrl":"10.1038/s41592-025-02750-y","url":null,"abstract":"Recent advances in genome sequencing have improved variant calling in complex regions of the human genome. However, it is difficult to quantify variant calling performance because existing standards often focus on specificity, neglecting completeness in difficult-to-analyze regions. To create a more comprehensive truth set, we used Mendelian inheritance in a large pedigree (CEPH-1463) to filter variants across PacBio high-fidelity (HiFi), Illumina and Oxford Nanopore Technologies platforms. This generated a variant map with over 4.7 million single-nucleotide variants, 767,795 insertions and deletions (indels), 537,486 tandem repeats and 24,315 structural variants, covering 2.77 Gb of the GRCh38 genome. This work adds ~200 Mb of high-confidence regions, including 8% more small variants, and introduces the first tandem repeat and structural variant truth sets for NA12878 and her family. As an example of the value of this improved benchmark, we retrained DeepVariant using these data to reduce genotyping errors by ~34%. This work introduces a pedigree-derived benchmark for single-nucleotide variants, indels, structural variants and tandem repeats, offering a variant map to validate sequencing workflows or to support the development and evaluation of new variant callers.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":"22 8","pages":"1669-1676"},"PeriodicalIF":32.1,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144784799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nature MethodsPub Date : 2025-08-04DOI: 10.1038/s41592-025-02766-4
Noah Holzleitner, Julian Grünewald
{"title":"Machine learning-trained protein domain insertion for the design of switchable proteins","authors":"Noah Holzleitner, Julian Grünewald","doi":"10.1038/s41592-025-02766-4","DOIUrl":"10.1038/s41592-025-02766-4","url":null,"abstract":"ProDomino is a machine-learning model that efficiently predicts domain insertion sites in host proteins on the basis of amino acid sequence alone. The model enables the greatly accelerated design of functional multi-domain proteins, such as light-triggered or drug-triggered protein switches.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":"22 8","pages":"1629-1631"},"PeriodicalIF":32.1,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144784797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nature MethodsPub Date : 2025-07-31DOI: 10.1038/s41592-025-02725-z
{"title":"Learning the language of biological interactions","authors":"","doi":"10.1038/s41592-025-02725-z","DOIUrl":"10.1038/s41592-025-02725-z","url":null,"abstract":"Sliding Window Interaction Grammar (SWING) is an interaction language model that learns the lexicon and rules of protein interactions. SWING can predict the effects of genetic variations on these interactions in various biological contexts.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":"22 8","pages":"1634-1635"},"PeriodicalIF":32.1,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144760539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nature MethodsPub Date : 2025-07-28DOI: 10.1038/s41592-025-02748-6
Zhizheng Wang, Qiao Jin, Chih-Hsuan Wei, Shubo Tian, Po-Ting Lai, Qingqing Zhu, Chi-Ping Day, Christina Ross, Robert Leaman, Zhiyong Lu
{"title":"GeneAgent: self-verification language agent for gene-set analysis using domain databases","authors":"Zhizheng Wang, Qiao Jin, Chih-Hsuan Wei, Shubo Tian, Po-Ting Lai, Qingqing Zhu, Chi-Ping Day, Christina Ross, Robert Leaman, Zhiyong Lu","doi":"10.1038/s41592-025-02748-6","DOIUrl":"10.1038/s41592-025-02748-6","url":null,"abstract":"Gene-set analysis seeks to identify the biological mechanisms underlying groups of genes with shared functions. Large language models (LLMs) have recently shown promise in generating functional descriptions for input gene sets but may produce factually incorrect statements, commonly referred to as hallucinations in LLMs. Here we present GeneAgent, an LLM-based AI agent for gene-set analysis that reduces hallucinations by autonomously interacting with biological databases to verify its own output. Evaluation of 1,106 gene sets collected from different sources demonstrates that GeneAgent is consistently more accurate than GPT-4 by a significant margin. We further applied GeneAgent to seven novel gene sets derived from mouse B2905 melanoma cell lines. Expert review confirmed that GeneAgent produces more relevant and comprehensive functional descriptions than GPT-4, providing valuable insights into gene functions and expediting knowledge discovery. GeneAgent is a language agent using large language models and self-verification to improve gene-set function annotation.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":"22 8","pages":"1677-1685"},"PeriodicalIF":32.1,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12328209/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144732393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nature MethodsPub Date : 2025-07-28DOI: 10.1038/s41592-025-02723-1
Jane C. Siwek, Alisa A. Omelchenko, Prabal Chhibbar, Sanya Arshad, AnnaElaine Rosengart, Iliyan Nazarali, Akash Patel, Kiran Nazarali, Javad Rahimikollu, Jeremy S. Tilstra, Mark J. Shlomchik, David R. Koes, Alok V. Joglekar, Jishnu Das
{"title":"Sliding Window Interaction Grammar (SWING): a generalized interaction language model for peptide and protein interactions","authors":"Jane C. Siwek, Alisa A. Omelchenko, Prabal Chhibbar, Sanya Arshad, AnnaElaine Rosengart, Iliyan Nazarali, Akash Patel, Kiran Nazarali, Javad Rahimikollu, Jeremy S. Tilstra, Mark J. Shlomchik, David R. Koes, Alok V. Joglekar, Jishnu Das","doi":"10.1038/s41592-025-02723-1","DOIUrl":"10.1038/s41592-025-02723-1","url":null,"abstract":"Protein language models embed protein sequences for different tasks. However, these are suboptimal at learning the language of protein interactions. We developed an interaction language model (iLM), Sliding Window Interaction Grammar (SWING) that leverages differences in amino-acid properties to generate an interaction vocabulary. SWING successfully predicted both class I and class II peptide–major histocompatibility complex interactions. Furthermore, the class I SWING model could uniquely cross-predict class II interactions, a complex prediction task not attempted by existing methods. Using human class I and II data, SWING accurately predicted murine class II peptide–major histocompatibility interactions involving risk alleles in systemic lupus erythematosus and type 1 diabetes. SWING accurately predicted how variants can disrupt specific protein–protein interactions, based on sequence information alone. SWING outperformed passive uses of protein language model embeddings, demonstrating the value of the unique iLM architecture. Overall, SWING is a generalizable zero-shot iLM that learns the language of protein–protein interactions. SWING is a versatile interaction language model that can learn the language of peptide and protein interactions.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":"22 8","pages":"1707-1719"},"PeriodicalIF":32.1,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12328204/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144732394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nature MethodsPub Date : 2025-07-25DOI: 10.1038/s41592-025-02769-1
Gat Rauner, Piyush B. Gupta, Charlotte Kuperwasser
{"title":"From 2D to 3D and beyond: the evolution and impact of in vitro tumor models in cancer research","authors":"Gat Rauner, Piyush B. Gupta, Charlotte Kuperwasser","doi":"10.1038/s41592-025-02769-1","DOIUrl":"10.1038/s41592-025-02769-1","url":null,"abstract":"In vitro tumor models are essential tools for cancer research, offering key insights into not only tumor biology but also therapeutic responses. The transition from traditional two-dimensional to three-dimensional organoid systems marks a paradigm shift in cancer modeling. Although two-dimensional models have been instrumental in elucidating fundamental molecular and genetic mechanisms, they fail to accurately replicate the intricate three-dimensional architecture and dynamic microenvironment characteristic of human tumors. Here we outline how advanced organoid technologies now enable more faithful recapitulation of tumor heterogeneity that better mimic native tissue mechanics and biochemistry. We discuss emerging methods, including air–liquid interface cultures, microfluidic tumor-on-a-chip devices and high-content imaging integrated with machine learning, which collectively address longstanding challenges such as matrix variability and the limited incorporation of immune and vascular elements. These innovations promise to enhance reproducibility and scalability while providing unprecedented insights into tumor biology, cancer progression and therapeutic strategies. This Perspective discusses recent progresses in development of in vitro tumor models and the current challenges in this field.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":"22 9","pages":"1776-1787"},"PeriodicalIF":32.1,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144718214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}