{"title":"机器学习辅助的超低密度单核苷酸多态性面板有助于识别塔帕卡尔牛品种:家畜基因组学数字化转型的启示。","authors":"Harshit Kumar, Manjit Panigrahi, Dongwon Seo, Sunghyun Cho, Bharat Bhushan, Triveni Dutt","doi":"10.1089/omi.2024.0153","DOIUrl":null,"url":null,"abstract":"<p><p>Cattle breed identification is crucial for livestock research and sustainable food systems, and advances in genomics and artificial intelligence present new opportunities to address these challenges. This study investigates the identification of the Tharparkar cattle breed using genomics tools combined with machine learning (ML) techniques. By leveraging data from the Bovine SNP 50K chip, we developed a breed-specific panel of single nucleotide polymorphisms (SNPs) for Tharparkar cattle and integrated data from seven other Indian cattle populations to enhance panel robustness. Genome-wide association studies (GWAS) and principal component analysis were employed to identify 500 SNPs, which were then refined using ML models-AdaBoost, bagging tree, gradient boosting machines, and random forest-to determine the minimal number of SNPs needed for accurate breed identification. Panels of 23 and 48 SNPs achieved accuracy rates of 95.2-98.4%. Importantly, the identified SNPs were associated with key productive and adaptive traits, thus attesting to the value and potentials of digital transformation in livestock genomics. The ML-aided ultra-low-density SNP panel approach reported here not only facilitates breed identification but also contributes to preserving genetic diversity and guiding future breeding programs.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Learning-Aided Ultra-Low-Density Single Nucleotide Polymorphism Panel Helps to Identify the Tharparkar Cattle Breed: Lessons for Digital Transformation in Livestock Genomics.\",\"authors\":\"Harshit Kumar, Manjit Panigrahi, Dongwon Seo, Sunghyun Cho, Bharat Bhushan, Triveni Dutt\",\"doi\":\"10.1089/omi.2024.0153\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Cattle breed identification is crucial for livestock research and sustainable food systems, and advances in genomics and artificial intelligence present new opportunities to address these challenges. This study investigates the identification of the Tharparkar cattle breed using genomics tools combined with machine learning (ML) techniques. By leveraging data from the Bovine SNP 50K chip, we developed a breed-specific panel of single nucleotide polymorphisms (SNPs) for Tharparkar cattle and integrated data from seven other Indian cattle populations to enhance panel robustness. Genome-wide association studies (GWAS) and principal component analysis were employed to identify 500 SNPs, which were then refined using ML models-AdaBoost, bagging tree, gradient boosting machines, and random forest-to determine the minimal number of SNPs needed for accurate breed identification. Panels of 23 and 48 SNPs achieved accuracy rates of 95.2-98.4%. Importantly, the identified SNPs were associated with key productive and adaptive traits, thus attesting to the value and potentials of digital transformation in livestock genomics. The ML-aided ultra-low-density SNP panel approach reported here not only facilitates breed identification but also contributes to preserving genetic diversity and guiding future breeding programs.</p>\",\"PeriodicalId\":2,\"journal\":{\"name\":\"ACS Applied Bio Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Bio Materials\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1089/omi.2024.0153\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/9/20 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, BIOMATERIALS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1089/omi.2024.0153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/20 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0
摘要
牛的品种识别对于家畜研究和可持续粮食系统至关重要,而基因组学和人工智能的进步为应对这些挑战提供了新的机遇。本研究利用基因组学工具与机器学习(ML)技术相结合,对塔帕卡尔牛的品种识别进行了研究。通过利用牛 SNP 50K 芯片的数据,我们为塔帕卡尔牛开发了一个品种特异性单核苷酸多态性(SNPs)面板,并整合了来自其他七个印度牛种群的数据,以增强面板的稳健性。利用全基因组关联研究(GWAS)和主成分分析鉴定出了 500 个 SNPs,然后利用 ML 模型--AdaBoost、bagging tree、梯度提升机和随机森林对这些 SNPs 进行了改进,以确定准确鉴定品种所需的最少 SNPs 数量。23 个和 48 个 SNP 的面板准确率达到 95.2-98.4%。重要的是,鉴定出的 SNP 与关键的生产性和适应性性状相关,从而证明了家畜基因组学中数字化转型的价值和潜力。本文报告的 ML 辅助超低密度 SNP 面板方法不仅有助于品种鉴定,还有助于保护遗传多样性和指导未来的育种计划。
Machine Learning-Aided Ultra-Low-Density Single Nucleotide Polymorphism Panel Helps to Identify the Tharparkar Cattle Breed: Lessons for Digital Transformation in Livestock Genomics.
Cattle breed identification is crucial for livestock research and sustainable food systems, and advances in genomics and artificial intelligence present new opportunities to address these challenges. This study investigates the identification of the Tharparkar cattle breed using genomics tools combined with machine learning (ML) techniques. By leveraging data from the Bovine SNP 50K chip, we developed a breed-specific panel of single nucleotide polymorphisms (SNPs) for Tharparkar cattle and integrated data from seven other Indian cattle populations to enhance panel robustness. Genome-wide association studies (GWAS) and principal component analysis were employed to identify 500 SNPs, which were then refined using ML models-AdaBoost, bagging tree, gradient boosting machines, and random forest-to determine the minimal number of SNPs needed for accurate breed identification. Panels of 23 and 48 SNPs achieved accuracy rates of 95.2-98.4%. Importantly, the identified SNPs were associated with key productive and adaptive traits, thus attesting to the value and potentials of digital transformation in livestock genomics. The ML-aided ultra-low-density SNP panel approach reported here not only facilitates breed identification but also contributes to preserving genetic diversity and guiding future breeding programs.