Predictive ability of multi-population genomic prediction methods of phenotypes for reproduction traits in Chinese and Austrian pigs

IF 3.1 1区农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE

Genetics Selection Evolution Pub Date : 2024-06-26 DOI:10.1186/s12711-024-00915-5

Xue Wang, Zipeng Zhang, Hehe Du, Christina Pfeiffer, Gábor Mészáros, Xiangdong Ding

{"title":"Predictive ability of multi-population genomic prediction methods of phenotypes for reproduction traits in Chinese and Austrian pigs","authors":"Xue Wang, Zipeng Zhang, Hehe Du, Christina Pfeiffer, Gábor Mészáros, Xiangdong Ding","doi":"10.1186/s12711-024-00915-5","DOIUrl":null,"url":null,"abstract":"Multi-population genomic prediction can rapidly expand the size of the reference population and improve genomic prediction ability. Machine learning (ML) algorithms have shown advantages in single-population genomic prediction of phenotypes. However, few studies have explored the effectiveness of ML methods for multi-population genomic prediction. In this study, 3720 Yorkshire pigs from Austria and four breeding farms in China were used, and single-trait genomic best linear unbiased prediction (ST-GBLUP), multitrait GBLUP (MT-GBLUP), Bayesian Horseshoe (BayesHE), and three ML methods (support vector regression (SVR), kernel ridge regression (KRR) and AdaBoost.R2) were compared to explore the optimal method for joint genomic prediction of phenotypes of Chinese and Austrian pigs through 10 replicates of fivefold cross-validation. In this study, we tested the performance of different methods in two scenarios: (i) including only one Austrian population and one Chinese pig population that were genetically linked based on principal component analysis (PCA) (designated as the “two-population scenario”) and (ii) adding reference populations that are unrelated based on PCA to the above two populations (designated as the “multi-population scenario”). Our results show that, the use of MT-GBLUP in the two-population scenario resulted in an improvement of 7.1% in predictive ability compared to ST-GBLUP, while the use of SVR and KKR yielded improvements in predictive ability of 4.5 and 5.3%, respectively, compared to MT-GBLUP. SVR and KRR also yielded lower mean square errors (MSE) in most population and trait combinations. In the multi-population scenario, improvements in predictive ability of 29.7, 24.4 and 11.1% were obtained compared to ST-GBLUP when using, respectively, SVR, KRR, and AdaBoost.R2. However, compared to MT-GBLUP, the potential of ML methods to improve predictive ability was not demonstrated. Our study demonstrates that ML algorithms can achieve better prediction performance than multitrait GBLUP models in multi-population genomic prediction of phenotypes when the populations have similar genetic backgrounds; however, when reference populations that are unrelated based on PCA are added, the ML methods did not show a benefit. When the number of populations increased, only MT-GBLUP improved predictive ability in both validation populations, while the other methods showed improvement in only one population.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"29 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics Selection Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12711-024-00915-5","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-population genomic prediction can rapidly expand the size of the reference population and improve genomic prediction ability. Machine learning (ML) algorithms have shown advantages in single-population genomic prediction of phenotypes. However, few studies have explored the effectiveness of ML methods for multi-population genomic prediction. In this study, 3720 Yorkshire pigs from Austria and four breeding farms in China were used, and single-trait genomic best linear unbiased prediction (ST-GBLUP), multitrait GBLUP (MT-GBLUP), Bayesian Horseshoe (BayesHE), and three ML methods (support vector regression (SVR), kernel ridge regression (KRR) and AdaBoost.R2) were compared to explore the optimal method for joint genomic prediction of phenotypes of Chinese and Austrian pigs through 10 replicates of fivefold cross-validation. In this study, we tested the performance of different methods in two scenarios: (i) including only one Austrian population and one Chinese pig population that were genetically linked based on principal component analysis (PCA) (designated as the “two-population scenario”) and (ii) adding reference populations that are unrelated based on PCA to the above two populations (designated as the “multi-population scenario”). Our results show that, the use of MT-GBLUP in the two-population scenario resulted in an improvement of 7.1% in predictive ability compared to ST-GBLUP, while the use of SVR and KKR yielded improvements in predictive ability of 4.5 and 5.3%, respectively, compared to MT-GBLUP. SVR and KRR also yielded lower mean square errors (MSE) in most population and trait combinations. In the multi-population scenario, improvements in predictive ability of 29.7, 24.4 and 11.1% were obtained compared to ST-GBLUP when using, respectively, SVR, KRR, and AdaBoost.R2. However, compared to MT-GBLUP, the potential of ML methods to improve predictive ability was not demonstrated. Our study demonstrates that ML algorithms can achieve better prediction performance than multitrait GBLUP models in multi-population genomic prediction of phenotypes when the populations have similar genetic backgrounds; however, when reference populations that are unrelated based on PCA are added, the ML methods did not show a benefit. When the number of populations increased, only MT-GBLUP improved predictive ability in both validation populations, while the other methods showed improvement in only one population.

查看原文本刊更多论文

中国猪和奥地利猪繁殖性状表型的多群体基因组预测方法的预测能力

多群体基因组预测可以迅速扩大参考群体的规模，提高基因组预测能力。机器学习（ML）算法在表型的单种群基因组预测中已显示出优势。然而，很少有研究探讨 ML 方法在多种群基因组预测中的有效性。本研究以奥地利和中国四个种猪场的 3720 头约克夏猪为研究对象，采用单性状基因组最佳线性无偏预测法（ST-GBLUP）、多性状 GBLUP 法（MT-GBLUP）、贝叶斯马蹄法（BayesHE）和三种 ML 方法（支持向量回归法（SVR）、核岭回归法（KRR）和 AdaBoost.R2）进行了比较，通过 10 个重复的五重交叉验证，探索出了中国猪和奥地利猪表型联合基因组预测的最佳方法。在这项研究中，我们在两种情况下测试了不同方法的性能：(i) 仅包括一个奥地利种群和一个中国猪种群，根据主成分分析（PCA），这两个种群在基因上有联系（称为 "双种群情景"）；(ii) 在上述两个种群的基础上增加根据 PCA 分析不相关的参考种群（称为 "多种群情景"）。我们的结果表明，与 ST-GBLUP 相比，在双种群方案中使用 MT-GBLUP 可使预测能力提高 7.1%，而与 MT-GBLUP 相比，使用 SVR 和 KKR 可使预测能力分别提高 4.5% 和 5.3%。在大多数种群和性状组合中，SVR 和 KRR 的均方误差（MSE）也较低。在多种群情况下，使用 SVR、KRR 和 AdaBoost.R2 与 ST-GBLUP 相比，预测能力分别提高了 29.7%、24.4% 和 11.1%。不过，与 MT-GBLUP 相比，ML 方法在提高预测能力方面的潜力并未得到证实。我们的研究表明，在多种群基因组表型预测中，当种群具有相似的遗传背景时，ML 算法能比多特征 GBLUP 模型获得更好的预测性能；但是，当根据 PCA 加入不相关的参考种群时，ML 方法并没有显示出优势。当种群数量增加时，只有 MT-GBLUP 提高了两个验证种群的预测能力，而其他方法只提高了一个种群的预测能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Genetics Selection Evolution 生物-奶制品与动物科学

CiteScore

6.50

自引率

9.80%

发文量

审稿时长

1 months

期刊介绍： Genetics Selection Evolution invites basic, applied and methodological content that will aid the current understanding and the utilization of genetic variability in domestic animal species. Although the focus is on domestic animal species, research on other species is invited if it contributes to the understanding of the use of genetic variability in domestic animals. Genetics Selection Evolution publishes results from all levels of study, from the gene to the quantitative trait, from the individual to the population, the breed or the species. Contributions concerning both the biological approach, from molecular genetics to quantitative genetics, as well as the mathematical approach, from population genetics to statistics, are welcome. Specific areas of interest include but are not limited to: gene and QTL identification, mapping and characterization, analysis of new phenotypes, high-throughput SNP data analysis, functional genomics, cytogenetics, genetic diversity of populations and breeds, genetic evaluation, applied and experimental selection, genomic selection, selection efficiency, and statistical methodology for the genetic analysis of phenotypes with quantitative and mixed inheritance.