Incorporating information of causal variants in genomic prediction using GBLUP or machine learning models in a simulated livestock population.

IF 6.5 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE
Jifan Yang, Mario P L Calus, Yvonne C J Wientjes, Theo H E Meuwissen, Pascal Duenk
{"title":"Incorporating information of causal variants in genomic prediction using GBLUP or machine learning models in a simulated livestock population.","authors":"Jifan Yang, Mario P L Calus, Yvonne C J Wientjes, Theo H E Meuwissen, Pascal Duenk","doi":"10.1186/s40104-025-01250-5","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Genomic prediction has revolutionized animal breeding, with GBLUP being the most widely used prediction model. In theory, the accuracy of genomic prediction could be improved by incorporating information from QTL. This strategy could be especially beneficial for machine learning models that are able to distinguish informative from uninformative features. The objective of this study was to assess the benefit of incorporating QTL genotypes in GBLUP and machine learning models. This study simulated a selected livestock population where QTL and their effects were known. We used four genomic prediction models, GBLUP, (weighted) 2GBLUP, random forest (RF), and support vector regression (SVR) to predict breeding values of young animals, and considered different scenarios that varied in the proportion of genetic variance explained by the included QTL.</p><p><strong>Results: </strong>2GBLUP resulted in the highest accuracy. Its accuracy increased when the included QTL explained up to 80% of the genetic variance, after which the accuracy dropped. With a weighted 2GBLUP model, the accuracy always increased when more QTL were included. Prediction accuracy of GBLUP was consistently higher than SVR, and the accuracy for both models slightly increased with more QTL information included. The RF model resulted in the lowest prediction accuracy, and did not improve by including QTL information.</p><p><strong>Conclusions: </strong>Our results show that incorporating QTL information in GBLUP and SVR can improve prediction accuracy, but the extent of improvement varies across models. RF had a much lower prediction accuracy than the other models and did not show improvements when QTL information was added. Two possible reasons for this result are that the data structure in our data does not allow RF to fully realize its potential and that RF is not designed well for this particular prediction problem. Our study highlighted the importance of selecting appropriate models for genomic prediction and underscored the potential limitations of machine learning models when applied to genomic prediction in livestock.</p>","PeriodicalId":64067,"journal":{"name":"Journal of Animal Science and Biotechnology","volume":"16 1","pages":"118"},"PeriodicalIF":6.5000,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12362903/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Animal Science and Biotechnology","FirstCategoryId":"1089","ListUrlMain":"https://doi.org/10.1186/s40104-025-01250-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Genomic prediction has revolutionized animal breeding, with GBLUP being the most widely used prediction model. In theory, the accuracy of genomic prediction could be improved by incorporating information from QTL. This strategy could be especially beneficial for machine learning models that are able to distinguish informative from uninformative features. The objective of this study was to assess the benefit of incorporating QTL genotypes in GBLUP and machine learning models. This study simulated a selected livestock population where QTL and their effects were known. We used four genomic prediction models, GBLUP, (weighted) 2GBLUP, random forest (RF), and support vector regression (SVR) to predict breeding values of young animals, and considered different scenarios that varied in the proportion of genetic variance explained by the included QTL.

Results: 2GBLUP resulted in the highest accuracy. Its accuracy increased when the included QTL explained up to 80% of the genetic variance, after which the accuracy dropped. With a weighted 2GBLUP model, the accuracy always increased when more QTL were included. Prediction accuracy of GBLUP was consistently higher than SVR, and the accuracy for both models slightly increased with more QTL information included. The RF model resulted in the lowest prediction accuracy, and did not improve by including QTL information.

Conclusions: Our results show that incorporating QTL information in GBLUP and SVR can improve prediction accuracy, but the extent of improvement varies across models. RF had a much lower prediction accuracy than the other models and did not show improvements when QTL information was added. Two possible reasons for this result are that the data structure in our data does not allow RF to fully realize its potential and that RF is not designed well for this particular prediction problem. Our study highlighted the importance of selecting appropriate models for genomic prediction and underscored the potential limitations of machine learning models when applied to genomic prediction in livestock.

Abstract Image

Abstract Image

Abstract Image

利用GBLUP或机器学习模型在模拟牲畜种群中整合基因组预测中因果变异的信息。
背景:基因组预测已经彻底改变了动物育种,GBLUP是最广泛使用的预测模型。从理论上讲,结合QTL信息可以提高基因组预测的准确性。这种策略对于能够区分信息和非信息特征的机器学习模型尤其有益。本研究的目的是评估将QTL基因型纳入GBLUP和机器学习模型的益处。本研究模拟了一个已知QTL及其影响的家畜种群。我们使用GBLUP、(加权)2GBLUP、随机森林(RF)和支持向量回归(SVR) 4种基因组预测模型来预测幼畜的繁育值,并考虑了不同情景下被纳入QTL解释的遗传变异比例的变化。结果:2GBLUP的准确率最高。当包含的QTL解释了高达80%的遗传变异时,其准确性增加,之后准确性下降。在加权2GBLUP模型中,随着QTL的增加,准确率不断提高。GBLUP的预测精度始终高于SVR,且随着QTL信息的增加,两种模型的预测精度均略有提高。RF模型的预测精度最低,并且在包含QTL信息后也没有提高预测精度。结论:我们的研究结果表明,在GBLUP和SVR中加入QTL信息可以提高预测精度,但不同模型的提高程度不同。RF模型的预测精度比其他模型低得多,并且在添加QTL信息时没有显示出改善。造成这一结果的两个可能原因是,我们数据中的数据结构不允许RF充分发挥其潜力,并且RF没有很好地设计用于这个特定的预测问题。我们的研究强调了选择合适的基因组预测模型的重要性,并强调了机器学习模型在应用于牲畜基因组预测时的潜在局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
10.30
自引率
0.00%
发文量
822
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信