利用GBLUP或机器学习模型在模拟牲畜种群中整合基因组预测中因果变异的信息。

IF 6.5 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE

Journal of Animal Science and Biotechnology Pub Date : 2025-08-19 DOI:10.1186/s40104-025-01250-5

Jifan Yang, Mario P L Calus, Yvonne C J Wientjes, Theo H E Meuwissen, Pascal Duenk

{"title":"利用GBLUP或机器学习模型在模拟牲畜种群中整合基因组预测中因果变异的信息。","authors":"Jifan Yang, Mario P L Calus, Yvonne C J Wientjes, Theo H E Meuwissen, Pascal Duenk","doi":"10.1186/s40104-025-01250-5","DOIUrl":null,"url":null,"abstract":"Background: Genomic prediction has revolutionized animal breeding, with GBLUP being the most widely used prediction model. In theory, the accuracy of genomic prediction could be improved by incorporating information from QTL. This strategy could be especially beneficial for machine learning models that are able to distinguish informative from uninformative features. The objective of this study was to assess the benefit of incorporating QTL genotypes in GBLUP and machine learning models. This study simulated a selected livestock population where QTL and their effects were known. We used four genomic prediction models, GBLUP, (weighted) 2GBLUP, random forest (RF), and support vector regression (SVR) to predict breeding values of young animals, and considered different scenarios that varied in the proportion of genetic variance explained by the included QTL.Results: 2GBLUP resulted in the highest accuracy. Its accuracy increased when the included QTL explained up to 80% of the genetic variance, after which the accuracy dropped. With a weighted 2GBLUP model, the accuracy always increased when more QTL were included. Prediction accuracy of GBLUP was consistently higher than SVR, and the accuracy for both models slightly increased with more QTL information included. The RF model resulted in the lowest prediction accuracy, and did not improve by including QTL information.Conclusions: Our results show that incorporating QTL information in GBLUP and SVR can improve prediction accuracy, but the extent of improvement varies across models. RF had a much lower prediction accuracy than the other models and did not show improvements when QTL information was added. Two possible reasons for this result are that the data structure in our data does not allow RF to fully realize its potential and that RF is not designed well for this particular prediction problem. Our study highlighted the importance of selecting appropriate models for genomic prediction and underscored the potential limitations of machine learning models when applied to genomic prediction in livestock.","PeriodicalId":64067,"journal":{"name":"Journal of Animal Science and Biotechnology","volume":"16 1","pages":"118"},"PeriodicalIF":6.5000,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12362903/pdf/","citationCount":"0","resultStr":"{\"title\":\"Incorporating information of causal variants in genomic prediction using GBLUP or machine learning models in a simulated livestock population.\",\"authors\":\"Jifan Yang, Mario P L Calus, Yvonne C J Wientjes, Theo H E Meuwissen, Pascal Duenk\",\"doi\":\"10.1186/s40104-025-01250-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Genomic prediction has revolutionized animal breeding, with GBLUP being the most widely used prediction model. In theory, the accuracy of genomic prediction could be improved by incorporating information from QTL. This strategy could be especially beneficial for machine learning models that are able to distinguish informative from uninformative features. The objective of this study was to assess the benefit of incorporating QTL genotypes in GBLUP and machine learning models. This study simulated a selected livestock population where QTL and their effects were known. We used four genomic prediction models, GBLUP, (weighted) 2GBLUP, random forest (RF), and support vector regression (SVR) to predict breeding values of young animals, and considered different scenarios that varied in the proportion of genetic variance explained by the included QTL.Results: 2GBLUP resulted in the highest accuracy. Its accuracy increased when the included QTL explained up to 80% of the genetic variance, after which the accuracy dropped. With a weighted 2GBLUP model, the accuracy always increased when more QTL were included. Prediction accuracy of GBLUP was consistently higher than SVR, and the accuracy for both models slightly increased with more QTL information included. The RF model resulted in the lowest prediction accuracy, and did not improve by including QTL information.Conclusions: Our results show that incorporating QTL information in GBLUP and SVR can improve prediction accuracy, but the extent of improvement varies across models. RF had a much lower prediction accuracy than the other models and did not show improvements when QTL information was added. Two possible reasons for this result are that the data structure in our data does not allow RF to fully realize its potential and that RF is not designed well for this particular prediction problem. Our study highlighted the importance of selecting appropriate models for genomic prediction and underscored the potential limitations of machine learning models when applied to genomic prediction in livestock.\",\"PeriodicalId\":64067,\"journal\":{\"name\":\"Journal of Animal Science and Biotechnology\",\"volume\":\"16 1\",\"pages\":\"118\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12362903/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Animal Science and Biotechnology\",\"FirstCategoryId\":\"1089\",\"ListUrlMain\":\"https://doi.org/10.1186/s40104-025-01250-5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, DAIRY & ANIMAL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Animal Science and Biotechnology","FirstCategoryId":"1089","ListUrlMain":"https://doi.org/10.1186/s40104-025-01250-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}

引用次数: 0

摘要

背景：基因组预测已经彻底改变了动物育种，GBLUP是最广泛使用的预测模型。从理论上讲，结合QTL信息可以提高基因组预测的准确性。这种策略对于能够区分信息和非信息特征的机器学习模型尤其有益。本研究的目的是评估将QTL基因型纳入GBLUP和机器学习模型的益处。本研究模拟了一个已知QTL及其影响的家畜种群。我们使用GBLUP、（加权）2GBLUP、随机森林（RF）和支持向量回归（SVR） 4种基因组预测模型来预测幼畜的繁育值，并考虑了不同情景下被纳入QTL解释的遗传变异比例的变化。结果：2GBLUP的准确率最高。当包含的QTL解释了高达80%的遗传变异时，其准确性增加，之后准确性下降。在加权2GBLUP模型中，随着QTL的增加，准确率不断提高。GBLUP的预测精度始终高于SVR，且随着QTL信息的增加，两种模型的预测精度均略有提高。RF模型的预测精度最低，并且在包含QTL信息后也没有提高预测精度。结论：我们的研究结果表明，在GBLUP和SVR中加入QTL信息可以提高预测精度，但不同模型的提高程度不同。RF模型的预测精度比其他模型低得多，并且在添加QTL信息时没有显示出改善。造成这一结果的两个可能原因是，我们数据中的数据结构不允许RF充分发挥其潜力，并且RF没有很好地设计用于这个特定的预测问题。我们的研究强调了选择合适的基因组预测模型的重要性，并强调了机器学习模型在应用于牲畜基因组预测时的潜在局限性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Incorporating information of causal variants in genomic prediction using GBLUP or machine learning models in a simulated livestock population.

查看原文本刊更多论文

Incorporating information of causal variants in genomic prediction using GBLUP or machine learning models in a simulated livestock population.

Background: Genomic prediction has revolutionized animal breeding, with GBLUP being the most widely used prediction model. In theory, the accuracy of genomic prediction could be improved by incorporating information from QTL. This strategy could be especially beneficial for machine learning models that are able to distinguish informative from uninformative features. The objective of this study was to assess the benefit of incorporating QTL genotypes in GBLUP and machine learning models. This study simulated a selected livestock population where QTL and their effects were known. We used four genomic prediction models, GBLUP, (weighted) 2GBLUP, random forest (RF), and support vector regression (SVR) to predict breeding values of young animals, and considered different scenarios that varied in the proportion of genetic variance explained by the included QTL.

Results: 2GBLUP resulted in the highest accuracy. Its accuracy increased when the included QTL explained up to 80% of the genetic variance, after which the accuracy dropped. With a weighted 2GBLUP model, the accuracy always increased when more QTL were included. Prediction accuracy of GBLUP was consistently higher than SVR, and the accuracy for both models slightly increased with more QTL information included. The RF model resulted in the lowest prediction accuracy, and did not improve by including QTL information.

Conclusions: Our results show that incorporating QTL information in GBLUP and SVR can improve prediction accuracy, but the extent of improvement varies across models. RF had a much lower prediction accuracy than the other models and did not show improvements when QTL information was added. Two possible reasons for this result are that the data structure in our data does not allow RF to fully realize its potential and that RF is not designed well for this particular prediction problem. Our study highlighted the importance of selecting appropriate models for genomic prediction and underscored the potential limitations of machine learning models when applied to genomic prediction in livestock.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Animal Science and Biotechnology

CiteScore

10.30

自引率

0.00%

发文量

822