A deep learning strategy for accurate identification of purebred and hybrid pigs across SNP chips

IF 6.5 1区 农林科学 Q1 Agricultural and Biological Sciences
Zipeng Zhang, Zhengwen Fang, Yongwang Du, Yilin He, Changsong Qian, Weijian Ye, Ning Zhang, Jianan Zhang, Xiangdong Ding
{"title":"A deep learning strategy for accurate identification of purebred and hybrid pigs across SNP chips","authors":"Zipeng Zhang, Zhengwen Fang, Yongwang Du, Yilin He, Changsong Qian, Weijian Ye, Ning Zhang, Jianan Zhang, Xiangdong Ding","doi":"10.1186/s40104-025-01249-y","DOIUrl":null,"url":null,"abstract":"Breed identification plays an important role in conserving indigenous breeds, managing genetic resources, and developing effective breeding strategies. However, researches on breed identification in livestock mainly focused on purebreds, and they yielded lower predict accuracy in hybrid. In this study, we presented a Multi-Layer Perceptron (MLP) model with multi-output regression framework specifically designed for genomic breed composition prediction of purebred and hybrid in pigs. We utilized a total of 8,199 pigs from breeding farms in eight provinces in China, comprising Yorkshire, Landrace, Duroc and hybrids of Yorkshire × Landrace. All the animals were genotyped with 1K, 50K and 100K SNP chips. Comparing with random forest (RF), support vector regression (SVR) and Admixture, our results from five replicates of fivefold cross validation demonstrated that MLP achieved a breed identification accuracy of 100% for both hybrid and purebreds in 50K and 100K SNP chips, SVR performed comparable with MLP, they both outperformed RF and Admixture. In the independent testing, MLP yielded accuracy of 100% for all three pure breeds and hybrid across all SNP chips and panel, while SVR yielded 0.026%–0.121% lower accuracy than MLP. Compared with classification-based framework, the new strategy of multi-output regression framework in this study was helpful to improve the predict accuracy. MLP, RF and SVR, achieved consistent improvements across all six SNP chips/panel, especially in hybrid identification. Our results showed the determination threshold for purebred had different effects, SVR, RF and Admixture were very sensitive to threshold values, their optimal threshold fluctuated in different scenarios, while MLP kept optimal threshold 0.75 in all cases. The threshold of 0.65–0.75 is ideal for accurate breed identification. Among different density of SNP chips, the 1K SNP chip was most cost-effective as yielding 100% accuracy with enlarging training set. Hybrid individuals in the training set were useful for both purebred and hybrid identification. Our new MLP strategy demonstrated its high accuracy and robust applicability across low-, medium-, and high-density SNP chips. Multi-output regression framework could universally enhance prediction accuracy for ML methods. Our new strategy is also helpful for breed identification in other livestock.","PeriodicalId":14928,"journal":{"name":"Journal of Animal Science and Biotechnology","volume":"40 1","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Animal Science and Biotechnology","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1186/s40104-025-01249-y","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

Breed identification plays an important role in conserving indigenous breeds, managing genetic resources, and developing effective breeding strategies. However, researches on breed identification in livestock mainly focused on purebreds, and they yielded lower predict accuracy in hybrid. In this study, we presented a Multi-Layer Perceptron (MLP) model with multi-output regression framework specifically designed for genomic breed composition prediction of purebred and hybrid in pigs. We utilized a total of 8,199 pigs from breeding farms in eight provinces in China, comprising Yorkshire, Landrace, Duroc and hybrids of Yorkshire × Landrace. All the animals were genotyped with 1K, 50K and 100K SNP chips. Comparing with random forest (RF), support vector regression (SVR) and Admixture, our results from five replicates of fivefold cross validation demonstrated that MLP achieved a breed identification accuracy of 100% for both hybrid and purebreds in 50K and 100K SNP chips, SVR performed comparable with MLP, they both outperformed RF and Admixture. In the independent testing, MLP yielded accuracy of 100% for all three pure breeds and hybrid across all SNP chips and panel, while SVR yielded 0.026%–0.121% lower accuracy than MLP. Compared with classification-based framework, the new strategy of multi-output regression framework in this study was helpful to improve the predict accuracy. MLP, RF and SVR, achieved consistent improvements across all six SNP chips/panel, especially in hybrid identification. Our results showed the determination threshold for purebred had different effects, SVR, RF and Admixture were very sensitive to threshold values, their optimal threshold fluctuated in different scenarios, while MLP kept optimal threshold 0.75 in all cases. The threshold of 0.65–0.75 is ideal for accurate breed identification. Among different density of SNP chips, the 1K SNP chip was most cost-effective as yielding 100% accuracy with enlarging training set. Hybrid individuals in the training set were useful for both purebred and hybrid identification. Our new MLP strategy demonstrated its high accuracy and robust applicability across low-, medium-, and high-density SNP chips. Multi-output regression framework could universally enhance prediction accuracy for ML methods. Our new strategy is also helpful for breed identification in other livestock.
通过SNP芯片准确识别纯种猪和杂交猪的深度学习策略
品种鉴定在保护地方品种、管理遗传资源和制定有效的育种策略方面发挥着重要作用。然而,家畜品种鉴定的研究主要集中在纯种,杂交品种的预测准确率较低。在这项研究中,我们提出了一个具有多输出回归框架的多层感知器(Multi-Layer Perceptron, MLP)模型,专门用于纯种猪和杂交猪的基因组品种组成预测。我们利用了来自中国8个省种猪场的8199头猪,包括约克郡、长白猪、杜洛克猪和约克郡×长白猪的杂交品种。用1K、50K和100K SNP芯片对所有动物进行基因分型。与随机森林(RF)、支持向量回归(SVR)和Admixture进行5次交叉验证,结果表明,在50K和100K SNP芯片上,MLP对杂种和纯种的品种识别准确率均达到100%,SVR与MLP相当,均优于RF和admix。在独立测试中,MLP在所有SNP芯片和面板上对所有三个纯种和杂交品种的准确率为100%,而SVR的准确率比MLP低0.026%-0.121%。与基于分类的框架相比,本研究提出的多输出回归框架的新策略有助于提高预测精度。MLP、RF和SVR在所有六个SNP芯片/面板上都取得了一致的改进,特别是在混合识别方面。结果表明,纯种犬的检测阈值具有不同的影响,SVR、RF和Admixture对阈值非常敏感,其最优阈值在不同情况下波动较大,而MLP在所有情况下均保持0.75的最优阈值。0.65-0.75的阈值是准确识别品种的理想值。在不同密度的SNP芯片中,1K SNP芯片的成本效益最高,随着训练集的扩大,准确率达到100%。训练集中的杂种个体对纯种和杂种的识别都是有用的。我们的新MLP策略证明了其在低、中、高密度SNP芯片上的高精度和强大的适用性。多输出回归框架可以普遍提高机器学习方法的预测精度。我们的新策略对其他家畜的品种鉴定也有帮助。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Animal Science and Biotechnology
Journal of Animal Science and Biotechnology AGRICULTURE, DAIRY & ANIMAL SCIENCE-
CiteScore
9.90
自引率
2.90%
发文量
822
审稿时长
17 weeks
期刊介绍: Journal of Animal Science and Biotechnology is an open access, peer-reviewed journal that encompasses all aspects of animal science and biotechnology. That includes domestic animal production, animal genetics and breeding, animal reproduction and physiology, animal nutrition and biochemistry, feed processing technology and bioevaluation, animal biotechnology, and meat science.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信