Sakshi Khaiwal, Matteo De Chiara, Benjamin P Barré, Inigo Barrio-Hernandez, Simon Stenberg, Pedro Beltrao, Jonas Warringer, Gianni Liti
{"title":"用机器学习预测酵母表型景观的自然变异。","authors":"Sakshi Khaiwal, Matteo De Chiara, Benjamin P Barré, Inigo Barrio-Hernandez, Simon Stenberg, Pedro Beltrao, Jonas Warringer, Gianni Liti","doi":"10.1038/s44320-025-00136-y","DOIUrl":null,"url":null,"abstract":"<p><p>Most organismal traits result from the complex interplay of many genetic and environmental factors, making their prediction difficult. Here, we used machine learning (ML) models to explore phenotype predictions for 223 traits measured across 1011 genome-sequenced Saccharomyces cerevisiae strains isolated worldwide. We benchmarked a ML pipeline with multiple linear and non-linear models to predict phenotypes from genotypes and gene expression, and determined gradient boosting machines as the best-performing model. Gene function disruption scores and gene presence/absence emerged as best predictors, suggesting a considerable contribution of the accessory genome in controlling phenotypes. The prediction accuracy broadly varied among phenotypes, with stress resistance being easier to predict compared to growth across nutrients. ML identified relevant genomic features linked to phenotypes, including high-impact variants with established relationships to phenotypes, despite these being rare in the population. Near-perfect accuracies were achieved when other phenomics data mostly in similar conditions were used, suggesting that useful information can be conveyed across phenotypes. Overall, our study underscores the power of ML to interpret the functional outcome of genetic variants.</p>","PeriodicalId":18906,"journal":{"name":"Molecular Systems Biology","volume":" ","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting natural variation in the yeast phenotypic landscape with machine learning.\",\"authors\":\"Sakshi Khaiwal, Matteo De Chiara, Benjamin P Barré, Inigo Barrio-Hernandez, Simon Stenberg, Pedro Beltrao, Jonas Warringer, Gianni Liti\",\"doi\":\"10.1038/s44320-025-00136-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Most organismal traits result from the complex interplay of many genetic and environmental factors, making their prediction difficult. Here, we used machine learning (ML) models to explore phenotype predictions for 223 traits measured across 1011 genome-sequenced Saccharomyces cerevisiae strains isolated worldwide. We benchmarked a ML pipeline with multiple linear and non-linear models to predict phenotypes from genotypes and gene expression, and determined gradient boosting machines as the best-performing model. Gene function disruption scores and gene presence/absence emerged as best predictors, suggesting a considerable contribution of the accessory genome in controlling phenotypes. The prediction accuracy broadly varied among phenotypes, with stress resistance being easier to predict compared to growth across nutrients. ML identified relevant genomic features linked to phenotypes, including high-impact variants with established relationships to phenotypes, despite these being rare in the population. Near-perfect accuracies were achieved when other phenomics data mostly in similar conditions were used, suggesting that useful information can be conveyed across phenotypes. Overall, our study underscores the power of ML to interpret the functional outcome of genetic variants.</p>\",\"PeriodicalId\":18906,\"journal\":{\"name\":\"Molecular Systems Biology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Systems Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1038/s44320-025-00136-y\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Systems Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s44320-025-00136-y","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Predicting natural variation in the yeast phenotypic landscape with machine learning.
Most organismal traits result from the complex interplay of many genetic and environmental factors, making their prediction difficult. Here, we used machine learning (ML) models to explore phenotype predictions for 223 traits measured across 1011 genome-sequenced Saccharomyces cerevisiae strains isolated worldwide. We benchmarked a ML pipeline with multiple linear and non-linear models to predict phenotypes from genotypes and gene expression, and determined gradient boosting machines as the best-performing model. Gene function disruption scores and gene presence/absence emerged as best predictors, suggesting a considerable contribution of the accessory genome in controlling phenotypes. The prediction accuracy broadly varied among phenotypes, with stress resistance being easier to predict compared to growth across nutrients. ML identified relevant genomic features linked to phenotypes, including high-impact variants with established relationships to phenotypes, despite these being rare in the population. Near-perfect accuracies were achieved when other phenomics data mostly in similar conditions were used, suggesting that useful information can be conveyed across phenotypes. Overall, our study underscores the power of ML to interpret the functional outcome of genetic variants.
期刊介绍:
Systems biology is a field that aims to understand complex biological systems by studying their components and how they interact. It is an integrative discipline that seeks to explain the properties and behavior of these systems.
Molecular Systems Biology is a scholarly journal that publishes top-notch research in the areas of systems biology, synthetic biology, and systems medicine. It is an open access journal, meaning that its content is freely available to readers, and it is peer-reviewed to ensure the quality of the published work.