Environment ensemble models for genomic prediction in common bean (Phaseolus vulgaris L.).

IF 3.9 2区生物学 Q1 GENETICS & HEREDITY

Plant Genome Pub Date : 2025-06-01 DOI:10.1002/tpg2.70057

Isabella Chiaravallotti, Owen Pauptit, Valerio Hoyos-Villegas

{"title":"Environment ensemble models for genomic prediction in common bean (Phaseolus vulgaris L.).","authors":"Isabella Chiaravallotti, Owen Pauptit, Valerio Hoyos-Villegas","doi":"10.1002/tpg2.70057","DOIUrl":null,"url":null,"abstract":"<p><p>For important food crops such as the common bean (Phaseolus vulgaris, L.), global demand continues to outpace the rate of genetic gain for quantitative traits. In this study, we leveraged the multi-environment trial (MET) dataset from the cooperative dry bean nursery (CDBN) to investigate the use of ensemble models for genomic prediction. This set spans 70 locations and 30 years, and accounts for over 150 phenotypes and hundreds of genotypes sequenced for 1.2 million single nucleotide polymorphism markers. We tested three models (linear regression, ridge regression, and neural networks). Each of the three models was implemented using three different approaches: (1) combining all data into one model (singular model), (2) all available single locations were used to train individual submodels comprising one ensemble model (ensemble model), and (3) optimized sets of single locations were used to train individual submodels comprising one ensemble model (optimized ensemble model). The optimized ensemble approach worked best for low-variance locations because the model variance was reduced by averaging across submodels in the ensemble. For models with low prediction accuracy, the ensemble approach can increase accuracy. In certain locations, prediction accuracy was able to overcome narrow-sense heritability, indicating that genomic selection is more efficient than phenotypic selection in these locations. This study indicates that breeding program collaboration can be a way to bypass the bottleneck of low data volume, as pooled data from the CDBN MET produced prediction accuracies of 0.70 for days to flowering, 0.54 for days to maturity, 0.95 for seed weight, and 0.67 for seed yield in individual locations.</p>","PeriodicalId":49002,"journal":{"name":"Plant Genome","volume":"18 2","pages":"e70057"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12159719/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Genome","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/tpg2.70057","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

For important food crops such as the common bean (Phaseolus vulgaris, L.), global demand continues to outpace the rate of genetic gain for quantitative traits. In this study, we leveraged the multi-environment trial (MET) dataset from the cooperative dry bean nursery (CDBN) to investigate the use of ensemble models for genomic prediction. This set spans 70 locations and 30 years, and accounts for over 150 phenotypes and hundreds of genotypes sequenced for 1.2 million single nucleotide polymorphism markers. We tested three models (linear regression, ridge regression, and neural networks). Each of the three models was implemented using three different approaches: (1) combining all data into one model (singular model), (2) all available single locations were used to train individual submodels comprising one ensemble model (ensemble model), and (3) optimized sets of single locations were used to train individual submodels comprising one ensemble model (optimized ensemble model). The optimized ensemble approach worked best for low-variance locations because the model variance was reduced by averaging across submodels in the ensemble. For models with low prediction accuracy, the ensemble approach can increase accuracy. In certain locations, prediction accuracy was able to overcome narrow-sense heritability, indicating that genomic selection is more efficient than phenotypic selection in these locations. This study indicates that breeding program collaboration can be a way to bypass the bottleneck of low data volume, as pooled data from the CDBN MET produced prediction accuracies of 0.70 for days to flowering, 0.54 for days to maturity, 0.95 for seed weight, and 0.67 for seed yield in individual locations.

查看原文本刊更多论文

菜豆基因组预测的环境集合模型。

对于重要的粮食作物，如菜豆（Phaseolus vulgaris， L.），全球需求继续超过数量性状遗传增益的速度。在这项研究中，我们利用来自合作干豆苗圃（CDBN）的多环境试验（MET）数据集来研究集成模型在基因组预测中的应用。该集合跨越70个地点和30年，包括150多种表型和数百种基因型，对120万个单核苷酸多态性标记进行了测序。我们测试了三种模型（线性回归、脊回归和神经网络）。每个模型都使用三种不同的方法实现：(1)将所有数据组合成一个模型（奇异模型），(2)使用所有可用的单个位置来训练包含一个集成模型的单个子模型（集成模型），以及(3)使用优化的单个位置集来训练包含一个集成模型的单个子模型（优化集成模型）。优化的集成方法对低方差位置效果最好，因为模型方差通过集成中的子模型的平均来减小。对于预测精度较低的模型，集成方法可以提高预测精度。在某些位置，预测准确性能够克服狭义遗传力，表明基因组选择在这些位置比表型选择更有效。该研究表明，育种计划协作可以绕过数据量不足的瓶颈，因为来自CDBN MET的汇总数据产生的预测精度为开花天数0.70，成熟天数0.54，种子重量0.95，种子产量0.67。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Plant Genome PLANT SCIENCES-GENETICS & HEREDITY

CiteScore

6.00

自引率

4.80%

发文量

审稿时长

>12 weeks

期刊介绍： The Plant Genome publishes original research investigating all aspects of plant genomics. Technical breakthroughs reporting improvements in the efficiency and speed of acquiring and interpreting plant genomics data are welcome. The editorial board gives preference to novel reports that use innovative genomic applications that advance our understanding of plant biology that may have applications to crop improvement. The journal also publishes invited review articles and perspectives that offer insight and commentary on recent advances in genomics and their potential for agronomic improvement.