Predicting natural variation in the yeast phenotypic landscape with machine learning.

IF 7.7 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Sakshi Khaiwal, Matteo De Chiara, Benjamin P Barré, Inigo Barrio-Hernandez, Simon Stenberg, Pedro Beltrao, Jonas Warringer, Gianni Liti
{"title":"Predicting natural variation in the yeast phenotypic landscape with machine learning.","authors":"Sakshi Khaiwal, Matteo De Chiara, Benjamin P Barré, Inigo Barrio-Hernandez, Simon Stenberg, Pedro Beltrao, Jonas Warringer, Gianni Liti","doi":"10.1038/s44320-025-00136-y","DOIUrl":null,"url":null,"abstract":"<p><p>Most organismal traits result from the complex interplay of many genetic and environmental factors, making their prediction difficult. Here, we used machine learning (ML) models to explore phenotype predictions for 223 traits measured across 1011 genome-sequenced Saccharomyces cerevisiae strains isolated worldwide. We benchmarked a ML pipeline with multiple linear and non-linear models to predict phenotypes from genotypes and gene expression, and determined gradient boosting machines as the best-performing model. Gene function disruption scores and gene presence/absence emerged as best predictors, suggesting a considerable contribution of the accessory genome in controlling phenotypes. The prediction accuracy broadly varied among phenotypes, with stress resistance being easier to predict compared to growth across nutrients. ML identified relevant genomic features linked to phenotypes, including high-impact variants with established relationships to phenotypes, despite these being rare in the population. Near-perfect accuracies were achieved when other phenomics data mostly in similar conditions were used, suggesting that useful information can be conveyed across phenotypes. Overall, our study underscores the power of ML to interpret the functional outcome of genetic variants.</p>","PeriodicalId":18906,"journal":{"name":"Molecular Systems Biology","volume":" ","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Systems Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s44320-025-00136-y","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Most organismal traits result from the complex interplay of many genetic and environmental factors, making their prediction difficult. Here, we used machine learning (ML) models to explore phenotype predictions for 223 traits measured across 1011 genome-sequenced Saccharomyces cerevisiae strains isolated worldwide. We benchmarked a ML pipeline with multiple linear and non-linear models to predict phenotypes from genotypes and gene expression, and determined gradient boosting machines as the best-performing model. Gene function disruption scores and gene presence/absence emerged as best predictors, suggesting a considerable contribution of the accessory genome in controlling phenotypes. The prediction accuracy broadly varied among phenotypes, with stress resistance being easier to predict compared to growth across nutrients. ML identified relevant genomic features linked to phenotypes, including high-impact variants with established relationships to phenotypes, despite these being rare in the population. Near-perfect accuracies were achieved when other phenomics data mostly in similar conditions were used, suggesting that useful information can be conveyed across phenotypes. Overall, our study underscores the power of ML to interpret the functional outcome of genetic variants.

用机器学习预测酵母表型景观的自然变异。
大多数生物性状是由许多遗传和环境因素复杂的相互作用产生的,这使得它们很难预测。在这里,我们使用机器学习(ML)模型来探索对全球分离的1011个基因组测序的酿酒酵母菌株测量的223个性状的表型预测。我们用多个线性和非线性模型对ML管道进行基准测试,以预测基因型和基因表达的表型,并确定梯度增强机是表现最好的模型。基因功能破坏评分和基因存在/缺失是最好的预测因子,表明辅助基因组在控制表型方面有相当大的贡献。预测的准确性在不同的表型之间差异很大,与不同营养物质的生长相比,抗逆性更容易预测。ML确定了与表型相关的相关基因组特征,包括与表型建立关系的高影响变异,尽管这些变异在人群中很少见。当其他表型组学数据大多在类似条件下使用时,获得了近乎完美的准确性,这表明有用的信息可以跨表型传递。总的来说,我们的研究强调了机器学习在解释遗传变异的功能结果方面的力量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular Systems Biology
Molecular Systems Biology 生物-生化与分子生物学
CiteScore
18.50
自引率
1.00%
发文量
62
审稿时长
6-12 weeks
期刊介绍: Systems biology is a field that aims to understand complex biological systems by studying their components and how they interact. It is an integrative discipline that seeks to explain the properties and behavior of these systems. Molecular Systems Biology is a scholarly journal that publishes top-notch research in the areas of systems biology, synthetic biology, and systems medicine. It is an open access journal, meaning that its content is freely available to readers, and it is peer-reviewed to ensure the quality of the published work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信