Optimizing fully-efficient two-stage models for genomic selection using open-source software.

IF 4.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Plant Methods Pub Date : 2025-02-04 DOI:10.1186/s13007-024-01318-9

Javier Fernández-González, Julio Isidro Y Sánchez

{"title":"Optimizing fully-efficient two-stage models for genomic selection using open-source software.","authors":"Javier Fernández-González, Julio Isidro Y Sánchez","doi":"10.1186/s13007-024-01318-9","DOIUrl":null,"url":null,"abstract":"<p><p>Genomic-assisted breeding has transitioned from theoretical concepts to practical applications in breeding. Genomic selection (GS) predicts genomic breeding values (GEBV) using dense genetic markers. Single-stage models predict GEBVs from phenotypic observations in one step, fully accounting for the entire variance-covariance structure among genotypes, but face computational challenges. Two-stage models, preferred for their simplicity and efficiency, first calculate adjusted genotypic means accounting for spatial variation within each environment, then use these means to predict GEBVs. However, unweighted (UNW) two-stage models assume independent errors among adjusted means, neglecting correlations among estimation errors. Here, we show that fully-efficient two-stage models perform similarly to UNW models for randomized complete block designs but substantially better for augmented designs. Our simulation studies demonstrate the impact of the fully-efficient methodology on prediction accuracy across different implementations and scenarios. Incorporating non-additive effects and augmented designs significantly improved accuracy, emphasizing the synergy between design and model strategy. Consistent performance requires the estimation error covariance to be incorporated into a random effect (Full_R model) rather than into the residuals. Our results suggest that the fully-efficient methodology, particularly the Full_R model, should be more prevalent, especially as GS increases the appeal of sparse designs. We also provide a comprehensive theoretical background and open-source R code, enhancing understanding and facilitating broader adoption of fully-efficient two-stage models in GS. Here, we offer insights into the practical applications of fully-efficient models and their potential to increase genetic gain, demonstrating a <math><mrow><mn>13.80</mn> <mo>%</mo></mrow> </math> improvement after five selection cycles when moving from UNW to Full_R models.</p>","PeriodicalId":20100,"journal":{"name":"Plant Methods","volume":"21 1","pages":"9"},"PeriodicalIF":4.7000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11796230/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Methods","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13007-024-01318-9","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Genomic-assisted breeding has transitioned from theoretical concepts to practical applications in breeding. Genomic selection (GS) predicts genomic breeding values (GEBV) using dense genetic markers. Single-stage models predict GEBVs from phenotypic observations in one step, fully accounting for the entire variance-covariance structure among genotypes, but face computational challenges. Two-stage models, preferred for their simplicity and efficiency, first calculate adjusted genotypic means accounting for spatial variation within each environment, then use these means to predict GEBVs. However, unweighted (UNW) two-stage models assume independent errors among adjusted means, neglecting correlations among estimation errors. Here, we show that fully-efficient two-stage models perform similarly to UNW models for randomized complete block designs but substantially better for augmented designs. Our simulation studies demonstrate the impact of the fully-efficient methodology on prediction accuracy across different implementations and scenarios. Incorporating non-additive effects and augmented designs significantly improved accuracy, emphasizing the synergy between design and model strategy. Consistent performance requires the estimation error covariance to be incorporated into a random effect (Full_R model) rather than into the residuals. Our results suggest that the fully-efficient methodology, particularly the Full_R model, should be more prevalent, especially as GS increases the appeal of sparse designs. We also provide a comprehensive theoretical background and open-source R code, enhancing understanding and facilitating broader adoption of fully-efficient two-stage models in GS. Here, we offer insights into the practical applications of fully-efficient models and their potential to increase genetic gain, demonstrating a $13.80 %$ improvement after five selection cycles when moving from UNW to Full_R models.

查看原文本刊更多论文

利用开源软件优化基因组选择的全高效两阶段模型。

基因组辅助育种已经从理论概念过渡到育种的实际应用。基因组选择（GS）利用密集遗传标记预测基因组育种值（GEBV）。单阶段模型从表型观察中一步预测gebv，充分考虑了基因型之间的整个方差-协方差结构，但面临计算挑战。两阶段模型因其简单和高效而受到青睐，该模型首先计算考虑每个环境内空间变化的调整基因型均值，然后使用这些均值预测gebv。然而，未加权（UNW）两阶段模型假设调整后均值之间的误差是独立的，忽略了估计误差之间的相关性。在这里，我们表明，对于随机完全块设计，完全有效的两阶段模型的表现与UNW模型相似，但对于增强设计则明显更好。我们的模拟研究证明了全高效方法对不同实现和场景的预测精度的影响。结合非加性效应和增强设计显著提高准确性，强调设计和模型策略之间的协同作用。一致的性能要求将估计误差协方差纳入随机效应（Full_R模型）而不是残差中。我们的结果表明，全高效的方法，特别是Full_R模型，应该更普遍，特别是随着GS增加了稀疏设计的吸引力。我们还提供了全面的理论背景和开放源代码的R代码，增强了对GS中全高效两阶段模型的理解并促进了更广泛的采用。在这里，我们提供了对全高效模型的实际应用及其增加遗传增益的潜力的见解，表明从UNW到Full_R模型经过5个选择周期后，遗传增益提高了13.80%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Plant Methods 生物-植物科学

CiteScore

9.20

自引率

3.90%

发文量

121

审稿时长

2 months

期刊介绍： Plant Methods is an open access, peer-reviewed, online journal for the plant research community that encompasses all aspects of technological innovation in the plant sciences. There is no doubt that we have entered an exciting new era in plant biology. The completion of the Arabidopsis genome sequence, and the rapid progress being made in other plant genomics projects are providing unparalleled opportunities for progress in all areas of plant science. Nevertheless, enormous challenges lie ahead if we are to understand the function of every gene in the genome, and how the individual parts work together to make the whole organism. Achieving these goals will require an unprecedented collaborative effort, combining high-throughput, system-wide technologies with more focused approaches that integrate traditional disciplines such as cell biology, biochemistry and molecular genetics. Technological innovation is probably the most important catalyst for progress in any scientific discipline. Plant Methods’ goal is to stimulate the development and adoption of new and improved techniques and research tools and, where appropriate, to promote consistency of methodologies for better integration of data from different laboratories.