Janek Sendrowski, Thomas Bataillon, Guillaume P Ramstein
{"title":"In silico prediction of variant effects: promises and limitations for precision plant breeding.","authors":"Janek Sendrowski, Thomas Bataillon, Guillaume P Ramstein","doi":"10.1007/s00122-025-04973-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Key message: </strong>Sequence-based AI models show great potential for prediction of variant effects at high resolution, but their practical value in plant breeding remains to be confirmed through rigorous validation studies. Plant breeding has traditionally relied on phenotyping to select individuals with desirable traits-a process that is both costly and time-consuming. Increasingly, breeding strategies are shifting toward precision breeding, where causal variants are directly targeted based on their effects. To predict the effects of causal variants, in silico methods are emerging as efficient alternatives or complements to mutagenesis screens. Here, we review state-of-the-art machine learning methods for predicting variant effects in plants across both coding and noncoding regions, contrasting supervised approaches in functional genomics with unsupervised methods in comparative genomics. We discuss challenges in validating predictions, and compare these methods with traditional association and comparative genomics techniques. We argue that modern sequence models extend traditional methods by generalizing across genomic contexts, fitting a unified model across loci rather than a separate model for each locus. In doing so, they address inherent limitations of traditional quantitative and evolutionary comparative genetics techniques. However, the accuracy and generalizability of sequence models heavily depend on the training data, highlighting the need for validation experiments. We point to successful applications of sequence models, especially with protein sequences, and identify areas for further improvement, especially in modeling regulatory sequences. While not yet mature for in silico-driven precision breeding, sequence models show strong potential to become an integral part of the breeder's toolbox.</p>","PeriodicalId":22955,"journal":{"name":"Theoretical and Applied Genetics","volume":"138 8","pages":"193"},"PeriodicalIF":4.2000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12304032/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theoretical and Applied Genetics","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1007/s00122-025-04973-1","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRONOMY","Score":null,"Total":0}
引用次数: 0
Abstract
Key message: Sequence-based AI models show great potential for prediction of variant effects at high resolution, but their practical value in plant breeding remains to be confirmed through rigorous validation studies. Plant breeding has traditionally relied on phenotyping to select individuals with desirable traits-a process that is both costly and time-consuming. Increasingly, breeding strategies are shifting toward precision breeding, where causal variants are directly targeted based on their effects. To predict the effects of causal variants, in silico methods are emerging as efficient alternatives or complements to mutagenesis screens. Here, we review state-of-the-art machine learning methods for predicting variant effects in plants across both coding and noncoding regions, contrasting supervised approaches in functional genomics with unsupervised methods in comparative genomics. We discuss challenges in validating predictions, and compare these methods with traditional association and comparative genomics techniques. We argue that modern sequence models extend traditional methods by generalizing across genomic contexts, fitting a unified model across loci rather than a separate model for each locus. In doing so, they address inherent limitations of traditional quantitative and evolutionary comparative genetics techniques. However, the accuracy and generalizability of sequence models heavily depend on the training data, highlighting the need for validation experiments. We point to successful applications of sequence models, especially with protein sequences, and identify areas for further improvement, especially in modeling regulatory sequences. While not yet mature for in silico-driven precision breeding, sequence models show strong potential to become an integral part of the breeder's toolbox.
期刊介绍:
Theoretical and Applied Genetics publishes original research and review articles in all key areas of modern plant genetics, plant genomics and plant biotechnology. All work needs to have a clear genetic component and significant impact on plant breeding. Theoretical considerations are only accepted in combination with new experimental data and/or if they indicate a relevant application in plant genetics or breeding. Emphasizing the practical, the journal focuses on research into leading crop plants and articles presenting innovative approaches.