Disentangling soybean GxE effects in an integrated genomic prediction and machine learning-GWAS workflow.

IF 4.4 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Niel Verbrigghe, Hilde Muylle, Marie Pegard, Hendrik Rietman, Vuk Đorđević, Marina Ćeran, Isabel Roldán-Ruiz
{"title":"Disentangling soybean GxE effects in an integrated genomic prediction and machine learning-GWAS workflow.","authors":"Niel Verbrigghe, Hilde Muylle, Marie Pegard, Hendrik Rietman, Vuk Đorđević, Marina Ćeran, Isabel Roldán-Ruiz","doi":"10.1186/s13007-025-01434-0","DOIUrl":null,"url":null,"abstract":"<p><p>Integrating genotype-by-Environment (GxE) interactions into genomic prediction models has been demonstrated to enhance the accuracy of predictions for crops exposed to unfavourable environmental conditions. However, despite the increasing complexity of machine learning models in genomic prediction, no model or approach has been found to be overall superior in comparison to a classical genomic best linear unbiased prediction (GBLUP) model. In this paper, we compared two GBLUP models (Linear Mixed Effects model and Bayesian GBLUP) with two machine learning models (Random Forest and Extreme Gradient Boosting) on the EUCLEG soybean genotype set phenotyped in Belgium and Serbia. We found similar performance for the Bayesian GBLUP and the two machine learning methods. However, using a workflow that decomposed the environment-specific BLUPs into a main genetic and an interaction GxE effect, we found increased predictive ability for the interaction component compared to a single-component approach. Furthermore, conducting a machine learning-genome wide association study (ML-GWAS) on both components allowed us to identify important markers for the main genetic effect, as well as environment-specific markers. These could then be associated with correlated markers in other environments. By constructing a small random forest model using only 50 uncorrelated, important markers we constructed a genomic prediction model with similar predictive ability over all scenarios when compared to the large models including all markers. The results demonstrate a new, integrated genomic prediction and machine learning-genome-wide association study (ML-GWAS) approach, aimed at high predictive ability and coupled marker detection in the soybean genome for traits phenotyped in different environments.</p>","PeriodicalId":20100,"journal":{"name":"Plant Methods","volume":"21 1","pages":"119"},"PeriodicalIF":4.4000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12376716/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Methods","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13007-025-01434-0","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Integrating genotype-by-Environment (GxE) interactions into genomic prediction models has been demonstrated to enhance the accuracy of predictions for crops exposed to unfavourable environmental conditions. However, despite the increasing complexity of machine learning models in genomic prediction, no model or approach has been found to be overall superior in comparison to a classical genomic best linear unbiased prediction (GBLUP) model. In this paper, we compared two GBLUP models (Linear Mixed Effects model and Bayesian GBLUP) with two machine learning models (Random Forest and Extreme Gradient Boosting) on the EUCLEG soybean genotype set phenotyped in Belgium and Serbia. We found similar performance for the Bayesian GBLUP and the two machine learning methods. However, using a workflow that decomposed the environment-specific BLUPs into a main genetic and an interaction GxE effect, we found increased predictive ability for the interaction component compared to a single-component approach. Furthermore, conducting a machine learning-genome wide association study (ML-GWAS) on both components allowed us to identify important markers for the main genetic effect, as well as environment-specific markers. These could then be associated with correlated markers in other environments. By constructing a small random forest model using only 50 uncorrelated, important markers we constructed a genomic prediction model with similar predictive ability over all scenarios when compared to the large models including all markers. The results demonstrate a new, integrated genomic prediction and machine learning-genome-wide association study (ML-GWAS) approach, aimed at high predictive ability and coupled marker detection in the soybean genome for traits phenotyped in different environments.

Abstract Image

Abstract Image

Abstract Image

在基因组预测和机器学习- gwas工作流程中分离大豆GxE效应。
将基因型-环境(GxE)相互作用整合到基因组预测模型中已被证明可以提高对暴露于不利环境条件下的作物预测的准确性。然而,尽管机器学习模型在基因组预测中的复杂性日益增加,但与经典的基因组最佳线性无偏预测(GBLUP)模型相比,没有发现任何模型或方法具有总体优势。在本文中,我们比较了两种GBLUP模型(线性混合效应模型和贝叶斯GBLUP模型)和两种机器学习模型(随机森林和极端梯度增强)对比利时和塞尔维亚EUCLEG大豆基因型集的表型分析。我们发现贝叶斯GBLUP和两种机器学习方法的性能相似。然而,使用将环境特异性blps分解为主要遗传和交互GxE效应的工作流程,我们发现与单一组件方法相比,交互组件的预测能力有所提高。此外,对这两个成分进行机器学习-基因组全关联研究(ML-GWAS)使我们能够确定主要遗传效应的重要标记以及环境特异性标记。这些可以与其他环境中的相关标记相关联。通过构建一个仅使用50个不相关的重要标记的小型随机森林模型,我们构建了一个与包含所有标记的大型模型相比,在所有情况下具有相似预测能力的基因组预测模型。该研究结果展示了一种新的、集成的基因组预测和机器学习-全基因组关联研究(ML-GWAS)方法,旨在提高大豆基因组对不同环境下表型性状的预测能力和偶联标记检测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Plant Methods
Plant Methods 生物-植物科学
CiteScore
9.20
自引率
3.90%
发文量
121
审稿时长
2 months
期刊介绍: Plant Methods is an open access, peer-reviewed, online journal for the plant research community that encompasses all aspects of technological innovation in the plant sciences. There is no doubt that we have entered an exciting new era in plant biology. The completion of the Arabidopsis genome sequence, and the rapid progress being made in other plant genomics projects are providing unparalleled opportunities for progress in all areas of plant science. Nevertheless, enormous challenges lie ahead if we are to understand the function of every gene in the genome, and how the individual parts work together to make the whole organism. Achieving these goals will require an unprecedented collaborative effort, combining high-throughput, system-wide technologies with more focused approaches that integrate traditional disciplines such as cell biology, biochemistry and molecular genetics. Technological innovation is probably the most important catalyst for progress in any scientific discipline. Plant Methods’ goal is to stimulate the development and adoption of new and improved techniques and research tools and, where appropriate, to promote consistency of methodologies for better integration of data from different laboratories.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信