Improvements in Prediction Performance of Ensemble Approaches for Genomic Prediction in Crop Breeding

Shunichiro Tomura, Mark Cooper, Owen M. Powell
{"title":"Improvements in Prediction Performance of Ensemble Approaches for Genomic Prediction in Crop Breeding","authors":"Shunichiro Tomura, Mark Cooper, Owen M. Powell","doi":"10.1101/2024.09.06.611589","DOIUrl":null,"url":null,"abstract":"The refinement of prediction accuracy in genomic prediction is a key factor in accelerating genetic gain for crop breeding. The mainstream strategy for prediction performance improvement has been developing an individual prediction model outperforming others across diverse prediction scenarios. However, this approach has limitations in situations when there is inconsistency in the superiority\nof individual models, attributed to the existence of complex nonlinear interactions among genetic markers. This phenomenon is expected given the No Free Lunch Theorem, which states that the average performance of an individual prediction model is expected to be equivalent to the others across all scenarios. Hence, we investigate the potential to leverage the concept of a stacked ensemble as an alternative method. We consider two traits, days to anthesis (DTA) and tiller number (TILN), measured on a Nested Association Mapping study, referred to herein as TeoNAM; a public maize\n(Zea mays) inbred W22 was crossed to five inbred Teosinte lines. The TeoNAM data set and the two traits were selected as the example of choice based on prior evidence that the traits were under the control of networks of genes and high levels of segregation diversity for the nodes of the genetic\nnetworks. Our analysis of both traits for the TeoNAM demonstrated an improvement in prediction performance, measured as the Pearson correlation, for the ensemble approach across all the proposed scenarios, for at least more than 95% of cases, compared to the six individual prediction models that contributed to the ensemble; rrBLUP, BayesB, RKHS, RF, SVR and GAT. The observed result indicates that there is a potential for ensemble approaches to enhance the performance of genomic prediction for crop breeding.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"53 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.06.611589","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The refinement of prediction accuracy in genomic prediction is a key factor in accelerating genetic gain for crop breeding. The mainstream strategy for prediction performance improvement has been developing an individual prediction model outperforming others across diverse prediction scenarios. However, this approach has limitations in situations when there is inconsistency in the superiority of individual models, attributed to the existence of complex nonlinear interactions among genetic markers. This phenomenon is expected given the No Free Lunch Theorem, which states that the average performance of an individual prediction model is expected to be equivalent to the others across all scenarios. Hence, we investigate the potential to leverage the concept of a stacked ensemble as an alternative method. We consider two traits, days to anthesis (DTA) and tiller number (TILN), measured on a Nested Association Mapping study, referred to herein as TeoNAM; a public maize (Zea mays) inbred W22 was crossed to five inbred Teosinte lines. The TeoNAM data set and the two traits were selected as the example of choice based on prior evidence that the traits were under the control of networks of genes and high levels of segregation diversity for the nodes of the genetic networks. Our analysis of both traits for the TeoNAM demonstrated an improvement in prediction performance, measured as the Pearson correlation, for the ensemble approach across all the proposed scenarios, for at least more than 95% of cases, compared to the six individual prediction models that contributed to the ensemble; rrBLUP, BayesB, RKHS, RF, SVR and GAT. The observed result indicates that there is a potential for ensemble approaches to enhance the performance of genomic prediction for crop breeding.
提高作物育种中基因组预测的集合方法的预测性能
提高基因组预测的准确性是加快作物育种遗传增益的关键因素。提高预测性能的主流策略是在各种预测情况下开发出优于其他预测模型的单个预测模型。然而,由于遗传标记之间存在复杂的非线性相互作用,当单个模型的优劣不一致时,这种方法就会受到限制。根据 "无免费午餐定理"(No Free Lunch Theorem),单个预测模型的平均性能在所有情况下都会等同于其他模型,因此这种现象是意料之中的。因此,我们研究了利用堆叠集合概念作为替代方法的潜力。我们考虑了两个性状:花期天数(DTA)和分蘖数(TILN),这两个性状是在嵌套关联图谱研究(本文简称为 TeoNAM)中测量的;一个公共玉米(Zea mays)近交系 W22 与五个近交系 Teosinte 杂交。之所以选择 TeoNAM 数据集和两个性状作为实例,是因为先前有证据表明这两个性状受基因网络控制,而且基因网络节点的分离多样性水平很高。我们对TeoNAM的这两个性状进行的分析表明,在所有建议的方案中,与参与组合的六个单独预测模型(rrBLUP、BayesB、RKHS、RF、SVR和GAT)相比,组合方法的预测性能(以皮尔逊相关性衡量)至少提高了95%以上。观察结果表明,集合方法有可能提高作物育种基因组预测的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信