AdaptiveGS:一个基于自适应堆叠集成机器学习的可解释基因组选择框架。

IF 4.2 1区 农林科学 Q1 AGRONOMY
Zhen Yang, Mei Song, Xianggeng Huang, Quanrui Rao, Shanghui Zhang, Zhongzheng Zhang, Chenyang Wang, Wenjia Li, Ran Qin, Chunhua Zhao, Yongzhen Wu, Han Sun, Guangchen Liu, Fa Cui
{"title":"AdaptiveGS:一个基于自适应堆叠集成机器学习的可解释基因组选择框架。","authors":"Zhen Yang, Mei Song, Xianggeng Huang, Quanrui Rao, Shanghui Zhang, Zhongzheng Zhang, Chenyang Wang, Wenjia Li, Ran Qin, Chunhua Zhao, Yongzhen Wu, Han Sun, Guangchen Liu, Fa Cui","doi":"10.1007/s00122-025-04991-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Key message: </strong>We developed an adaptive and unified stacking genomic selection framework and designed a model interpretation strategy to identify the candidate significant SNPs of target traits. Genomic selection (GS) is an important technique in modern molecular breeding. As a powerful machine learning (ML) GS approach, stacking ensemble learning (SEL) combines multiple basic models (base learners, BLs) and effectively blends the strengths of different models to precisely depict the complex relationships between phenotypes and genotypes. However, in the key step of the SEL, there is currently a lack of an effective and unified framework for the selection of BLs. We developed adaptiveGS, an adaptive and explainable data-driven BLs selection strategy for the first time, to pre-screen the optimal BLs for stacking GS framework and improve the prediction accuracy. The adaptiveGS is performed based on the PR index, leveraging the Pearson correlation coefficient (PCC) and the normalized root mean square error (NRMSE), and the top 3 out of 7 (or self-setting) ML are tailored to be BLs via the PR index. We compared the adaptiveGS with 13 other GS algorithms based on a total of 21 traits (datasets) from 4 species. The results showed that adaptiveGS outperformed the 13 models on most of the 21 traits, with the average prediction accuracy (PCC) reaching 0.703, an average improvement of 14.4%, demonstrating superior predictive accuracy and robustness. Furthermore, the SHapley Additive exPlanations (SHAP) technique was utilized to interpret the adaptiveGS and identify significant SNPs for trait variations and potential interaction effects between SNPs. The adaptiveGS provides an operable and unified solution for stacking GS users to improve prediction accuracy in the breeding field. The adaptiveGS package is accessible at https://github.com/yangzhen0117/adaptiveGS .</p>","PeriodicalId":22955,"journal":{"name":"Theoretical and Applied Genetics","volume":"138 9","pages":"204"},"PeriodicalIF":4.2000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AdaptiveGS: an explainable genomic selection framework based on adaptive stacking ensemble machine learning.\",\"authors\":\"Zhen Yang, Mei Song, Xianggeng Huang, Quanrui Rao, Shanghui Zhang, Zhongzheng Zhang, Chenyang Wang, Wenjia Li, Ran Qin, Chunhua Zhao, Yongzhen Wu, Han Sun, Guangchen Liu, Fa Cui\",\"doi\":\"10.1007/s00122-025-04991-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Key message: </strong>We developed an adaptive and unified stacking genomic selection framework and designed a model interpretation strategy to identify the candidate significant SNPs of target traits. Genomic selection (GS) is an important technique in modern molecular breeding. As a powerful machine learning (ML) GS approach, stacking ensemble learning (SEL) combines multiple basic models (base learners, BLs) and effectively blends the strengths of different models to precisely depict the complex relationships between phenotypes and genotypes. However, in the key step of the SEL, there is currently a lack of an effective and unified framework for the selection of BLs. We developed adaptiveGS, an adaptive and explainable data-driven BLs selection strategy for the first time, to pre-screen the optimal BLs for stacking GS framework and improve the prediction accuracy. The adaptiveGS is performed based on the PR index, leveraging the Pearson correlation coefficient (PCC) and the normalized root mean square error (NRMSE), and the top 3 out of 7 (or self-setting) ML are tailored to be BLs via the PR index. We compared the adaptiveGS with 13 other GS algorithms based on a total of 21 traits (datasets) from 4 species. The results showed that adaptiveGS outperformed the 13 models on most of the 21 traits, with the average prediction accuracy (PCC) reaching 0.703, an average improvement of 14.4%, demonstrating superior predictive accuracy and robustness. Furthermore, the SHapley Additive exPlanations (SHAP) technique was utilized to interpret the adaptiveGS and identify significant SNPs for trait variations and potential interaction effects between SNPs. The adaptiveGS provides an operable and unified solution for stacking GS users to improve prediction accuracy in the breeding field. The adaptiveGS package is accessible at https://github.com/yangzhen0117/adaptiveGS .</p>\",\"PeriodicalId\":22955,\"journal\":{\"name\":\"Theoretical and Applied Genetics\",\"volume\":\"138 9\",\"pages\":\"204\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Theoretical and Applied Genetics\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.1007/s00122-025-04991-z\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRONOMY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theoretical and Applied Genetics","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1007/s00122-025-04991-z","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRONOMY","Score":null,"Total":0}
引用次数: 0

摘要

我们开发了一个自适应和统一的堆叠基因组选择框架,并设计了一个模型解释策略来识别目标性状的候选显著snp。基因组选择是现代分子育种中的一项重要技术。作为一种强大的机器学习(ML) GS方法,堆叠集成学习(SEL)结合了多个基本模型(基础学习器,BLs),并有效地融合了不同模型的优势,以精确描述表型和基因型之间的复杂关系。然而,在SEL的关键环节,目前还缺乏一个有效统一的bl选择框架。我们首次开发了自适应可解释的数据驱动BLs选择策略adaptiveGS,以预筛选最优BLs用于叠加GS框架,提高预测精度。adaptiveGS是基于PR指数,利用Pearson相关系数(PCC)和标准化均方根误差(NRMSE)执行的,并且通过PR指数将7个ML(或自设定)中的前3个ML定制为BLs。我们将基于4个物种的21个性状(数据集)的自适应遗传算法与其他13种遗传算法进行了比较。结果表明,在21个性状中,adaptiveGS的预测精度(PCC)达到0.703,平均提高14.4%,表现出较好的预测精度和鲁棒性。此外,利用SHapley加性解释(SHapley Additive explanation, SHAP)技术对自适应基因进行了解释,并确定了性状变异的显著snp和snp之间潜在的相互作用效应。该自适应遗传算法为遗传算法用户的叠加提供了一种可操作的统一解决方案,以提高育种领域的预测精度。adaptiveGS包可从https://github.com/yangzhen0117/adaptiveGS访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
AdaptiveGS: an explainable genomic selection framework based on adaptive stacking ensemble machine learning.

Key message: We developed an adaptive and unified stacking genomic selection framework and designed a model interpretation strategy to identify the candidate significant SNPs of target traits. Genomic selection (GS) is an important technique in modern molecular breeding. As a powerful machine learning (ML) GS approach, stacking ensemble learning (SEL) combines multiple basic models (base learners, BLs) and effectively blends the strengths of different models to precisely depict the complex relationships between phenotypes and genotypes. However, in the key step of the SEL, there is currently a lack of an effective and unified framework for the selection of BLs. We developed adaptiveGS, an adaptive and explainable data-driven BLs selection strategy for the first time, to pre-screen the optimal BLs for stacking GS framework and improve the prediction accuracy. The adaptiveGS is performed based on the PR index, leveraging the Pearson correlation coefficient (PCC) and the normalized root mean square error (NRMSE), and the top 3 out of 7 (or self-setting) ML are tailored to be BLs via the PR index. We compared the adaptiveGS with 13 other GS algorithms based on a total of 21 traits (datasets) from 4 species. The results showed that adaptiveGS outperformed the 13 models on most of the 21 traits, with the average prediction accuracy (PCC) reaching 0.703, an average improvement of 14.4%, demonstrating superior predictive accuracy and robustness. Furthermore, the SHapley Additive exPlanations (SHAP) technique was utilized to interpret the adaptiveGS and identify significant SNPs for trait variations and potential interaction effects between SNPs. The adaptiveGS provides an operable and unified solution for stacking GS users to improve prediction accuracy in the breeding field. The adaptiveGS package is accessible at https://github.com/yangzhen0117/adaptiveGS .

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
9.60
自引率
7.40%
发文量
241
审稿时长
2.3 months
期刊介绍: Theoretical and Applied Genetics publishes original research and review articles in all key areas of modern plant genetics, plant genomics and plant biotechnology. All work needs to have a clear genetic component and significant impact on plant breeding. Theoretical considerations are only accepted in combination with new experimental data and/or if they indicate a relevant application in plant genetics or breeding. Emphasizing the practical, the journal focuses on research into leading crop plants and articles presenting innovative approaches.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信