{"title":"Environment ensemble models for genomic prediction in common bean (Phaseolus vulgaris L.).","authors":"Isabella Chiaravallotti, Owen Pauptit, Valerio Hoyos-Villegas","doi":"10.1002/tpg2.70057","DOIUrl":null,"url":null,"abstract":"<p><p>For important food crops such as the common bean (Phaseolus vulgaris, L.), global demand continues to outpace the rate of genetic gain for quantitative traits. In this study, we leveraged the multi-environment trial (MET) dataset from the cooperative dry bean nursery (CDBN) to investigate the use of ensemble models for genomic prediction. This set spans 70 locations and 30 years, and accounts for over 150 phenotypes and hundreds of genotypes sequenced for 1.2 million single nucleotide polymorphism markers. We tested three models (linear regression, ridge regression, and neural networks). Each of the three models was implemented using three different approaches: (1) combining all data into one model (singular model), (2) all available single locations were used to train individual submodels comprising one ensemble model (ensemble model), and (3) optimized sets of single locations were used to train individual submodels comprising one ensemble model (optimized ensemble model). The optimized ensemble approach worked best for low-variance locations because the model variance was reduced by averaging across submodels in the ensemble. For models with low prediction accuracy, the ensemble approach can increase accuracy. In certain locations, prediction accuracy was able to overcome narrow-sense heritability, indicating that genomic selection is more efficient than phenotypic selection in these locations. This study indicates that breeding program collaboration can be a way to bypass the bottleneck of low data volume, as pooled data from the CDBN MET produced prediction accuracies of 0.70 for days to flowering, 0.54 for days to maturity, 0.95 for seed weight, and 0.67 for seed yield in individual locations.</p>","PeriodicalId":49002,"journal":{"name":"Plant Genome","volume":"18 2","pages":"e70057"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12159719/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Genome","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/tpg2.70057","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
For important food crops such as the common bean (Phaseolus vulgaris, L.), global demand continues to outpace the rate of genetic gain for quantitative traits. In this study, we leveraged the multi-environment trial (MET) dataset from the cooperative dry bean nursery (CDBN) to investigate the use of ensemble models for genomic prediction. This set spans 70 locations and 30 years, and accounts for over 150 phenotypes and hundreds of genotypes sequenced for 1.2 million single nucleotide polymorphism markers. We tested three models (linear regression, ridge regression, and neural networks). Each of the three models was implemented using three different approaches: (1) combining all data into one model (singular model), (2) all available single locations were used to train individual submodels comprising one ensemble model (ensemble model), and (3) optimized sets of single locations were used to train individual submodels comprising one ensemble model (optimized ensemble model). The optimized ensemble approach worked best for low-variance locations because the model variance was reduced by averaging across submodels in the ensemble. For models with low prediction accuracy, the ensemble approach can increase accuracy. In certain locations, prediction accuracy was able to overcome narrow-sense heritability, indicating that genomic selection is more efficient than phenotypic selection in these locations. This study indicates that breeding program collaboration can be a way to bypass the bottleneck of low data volume, as pooled data from the CDBN MET produced prediction accuracies of 0.70 for days to flowering, 0.54 for days to maturity, 0.95 for seed weight, and 0.67 for seed yield in individual locations.
期刊介绍:
The Plant Genome publishes original research investigating all aspects of plant genomics. Technical breakthroughs reporting improvements in the efficiency and speed of acquiring and interpreting plant genomics data are welcome. The editorial board gives preference to novel reports that use innovative genomic applications that advance our understanding of plant biology that may have applications to crop improvement. The journal also publishes invited review articles and perspectives that offer insight and commentary on recent advances in genomics and their potential for agronomic improvement.