Boxuan Zhang , Xiaochang Li , Xinwei Jiang , Conghao Zhong , Ning Yang , Congjiao Sun
{"title":"Construction of SNP feature library for the identification of chicken breeds","authors":"Boxuan Zhang , Xiaochang Li , Xinwei Jiang , Conghao Zhong , Ning Yang , Congjiao Sun","doi":"10.1016/j.psj.2025.105844","DOIUrl":null,"url":null,"abstract":"<div><div>Breed identification is an important prerequisite for the protection, development and utilization of animal genetic resources. This study developed an accurate identification strategy for chicken breeds using whole-genome sequencing data from 492 individuals belonging to 14 chicken breeds. These breeds include eight local Chinese breeds (Tibetan chicken, Chahua chicken, Daweishan chicken, Liyang chicken, Lindian chicken, Silky chicken, Dongxiang blue-shell egg chicken, and WenChang chicken), three standard chicken breeds (Rhode Island Red, Leghorn, and Light Sussex chicken), two commercial breeds (Cobb broiler and Yellow Plumage Dwarf chicken) and the Red Jungle fowl. We compared three ancestry informative marker (AIM) detection methods (Fst, <em>I<sub>n</sub></em>, and PCA-correlated SNPs) and four machine learning classifiers (K-NearestNeighbor, Support Vector Machine, Random Forest, and XGBoost) to identify the best breed identification model.</div><div>A total of 30,831 high-information SNPs (Single nucleotide polymorphism) were detected and selected from these breeds using the three AIM detection methods. We found that several AIM methods performed well, but <em>I<sub>n</sub></em> was the best. Machine learning classifiers were implemented to fit the important SNP loci, and ROC (receiver operating characteristic curve) curves were generated to evaluate the performance of these machine learning classifiers. The ROC curves and 5-fold cross-validation results indicated that XGBoost was the best machine learning classifier, with the largest AUC (Area Under Curve) (macro-AUC=0.9996). In addition, XGBoost achieved 100% accuracy using only 238 SNPs.</div><div>In this study, it was observed that utilizing only 238 SNPs was effective for breed identification. We found that the combination of XGBoost and <em>I<sub>n</sub></em> was the optimal strategy for breed identification. This study provides a new method for breed identification, which is highly important for the breeding and preservation of animal genetic resources.</div></div>","PeriodicalId":20459,"journal":{"name":"Poultry Science","volume":"104 11","pages":"Article 105844"},"PeriodicalIF":4.2000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Poultry Science","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0032579125010855","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Breed identification is an important prerequisite for the protection, development and utilization of animal genetic resources. This study developed an accurate identification strategy for chicken breeds using whole-genome sequencing data from 492 individuals belonging to 14 chicken breeds. These breeds include eight local Chinese breeds (Tibetan chicken, Chahua chicken, Daweishan chicken, Liyang chicken, Lindian chicken, Silky chicken, Dongxiang blue-shell egg chicken, and WenChang chicken), three standard chicken breeds (Rhode Island Red, Leghorn, and Light Sussex chicken), two commercial breeds (Cobb broiler and Yellow Plumage Dwarf chicken) and the Red Jungle fowl. We compared three ancestry informative marker (AIM) detection methods (Fst, In, and PCA-correlated SNPs) and four machine learning classifiers (K-NearestNeighbor, Support Vector Machine, Random Forest, and XGBoost) to identify the best breed identification model.
A total of 30,831 high-information SNPs (Single nucleotide polymorphism) were detected and selected from these breeds using the three AIM detection methods. We found that several AIM methods performed well, but In was the best. Machine learning classifiers were implemented to fit the important SNP loci, and ROC (receiver operating characteristic curve) curves were generated to evaluate the performance of these machine learning classifiers. The ROC curves and 5-fold cross-validation results indicated that XGBoost was the best machine learning classifier, with the largest AUC (Area Under Curve) (macro-AUC=0.9996). In addition, XGBoost achieved 100% accuracy using only 238 SNPs.
In this study, it was observed that utilizing only 238 SNPs was effective for breed identification. We found that the combination of XGBoost and In was the optimal strategy for breed identification. This study provides a new method for breed identification, which is highly important for the breeding and preservation of animal genetic resources.
期刊介绍:
First self-published in 1921, Poultry Science is an internationally renowned monthly journal, known as the authoritative source for a broad range of poultry information and high-caliber research. The journal plays a pivotal role in the dissemination of preeminent poultry-related knowledge across all disciplines. As of January 2020, Poultry Science will become an Open Access journal with no subscription charges, meaning authors who publish here can make their research immediately, permanently, and freely accessible worldwide while retaining copyright to their work. Papers submitted for publication after October 1, 2019 will be published as Open Access papers.
An international journal, Poultry Science publishes original papers, research notes, symposium papers, and reviews of basic science as applied to poultry. This authoritative source of poultry information is consistently ranked by ISI Impact Factor as one of the top 10 agriculture, dairy and animal science journals to deliver high-caliber research. Currently it is the highest-ranked (by Impact Factor and Eigenfactor) journal dedicated to publishing poultry research. Subject areas include breeding, genetics, education, production, management, environment, health, behavior, welfare, immunology, molecular biology, metabolism, nutrition, physiology, reproduction, processing, and products.