RASEL: An Ensemble Model for Selection of Core SNPs and Its Application for Identification and Classification of Cattle Breeds.

IF 1.6 4区生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY

Biochemical Genetics Pub Date : 2025-08-22 DOI:10.1007/s10528-025-11230-z

K K Kanaka, Indrajit Ganguly, Sanjeev Singh, S V Kuralkar, Satpal Dixit, Nidhi Sukhija, Rangasai Chandra Goli

{"title":"RASEL: An Ensemble Model for Selection of Core SNPs and Its Application for Identification and Classification of Cattle Breeds.","authors":"K K Kanaka, Indrajit Ganguly, Sanjeev Singh, S V Kuralkar, Satpal Dixit, Nidhi Sukhija, Rangasai Chandra Goli","doi":"10.1007/s10528-025-11230-z","DOIUrl":null,"url":null,"abstract":"<p><p>Identifying and classifying different cattle populations as per their breed and utility holds immense practical importance in effective breeding management. For accurate identification and classification of cattle breeds, a reference panel of 10 breeds, 657 identified ancestry informative markers and different machine learning classifiers were employed. To boost the accuracy of breed identification, three distinct machine learning classification models: logistic regression, XGBoost, and random forest, each one having an accuracy of > 95%, were ensembled achieving an accuracy of > 98% with just 207 markers [breed informative markers (BIMs)]. Further, for classification of dairy and draft purpose cattle, the breed informative markers along with those in selection signatures specific to dairy and draft utility were explored, and 17 utility informative markers (UIMs) including 12 BIMs and 5 markers in selection signatures were identified based on an ensemble approach. The accuracy of classification of cattle based on the utility (dairy or draft) was > 96%. To demonstrate the application of UIMs, these markers were used to identify the utility of non-descript cattle of Maharashtra, India and found that many of these cattle were draft purpose and were aligning with their production performance. This information can further be used for taking breeding decisions for their grading up to dairy or draft cattle. Here, a novel pipeline which utilized [R-] reference panel, [A-] ancestry informative markers, [S-] selection signatures and the power of [EL-] ensemble machine learning for identifying and classifying the cattle, breed- and utility-wise, was developed, and we called it as RASEL (available at: https://github.com/kkokay07/RASEL ).</p>","PeriodicalId":482,"journal":{"name":"Biochemical Genetics","volume":" ","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biochemical Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s10528-025-11230-z","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Identifying and classifying different cattle populations as per their breed and utility holds immense practical importance in effective breeding management. For accurate identification and classification of cattle breeds, a reference panel of 10 breeds, 657 identified ancestry informative markers and different machine learning classifiers were employed. To boost the accuracy of breed identification, three distinct machine learning classification models: logistic regression, XGBoost, and random forest, each one having an accuracy of > 95%, were ensembled achieving an accuracy of > 98% with just 207 markers [breed informative markers (BIMs)]. Further, for classification of dairy and draft purpose cattle, the breed informative markers along with those in selection signatures specific to dairy and draft utility were explored, and 17 utility informative markers (UIMs) including 12 BIMs and 5 markers in selection signatures were identified based on an ensemble approach. The accuracy of classification of cattle based on the utility (dairy or draft) was > 96%. To demonstrate the application of UIMs, these markers were used to identify the utility of non-descript cattle of Maharashtra, India and found that many of these cattle were draft purpose and were aligning with their production performance. This information can further be used for taking breeding decisions for their grading up to dairy or draft cattle. Here, a novel pipeline which utilized [R-] reference panel, [A-] ancestry informative markers, [S-] selection signatures and the power of [EL-] ensemble machine learning for identifying and classifying the cattle, breed- and utility-wise, was developed, and we called it as RASEL (available at: https://github.com/kkokay07/RASEL ).

查看原文本刊更多论文

核心snp选择的集成模型及其在牛品种鉴定和分类中的应用。

根据品种和用途确定和分类不同的牛群对有效的育种管理具有巨大的实际意义。为了准确识别和分类牛品种，使用了10个品种的参考面板，657个已识别的祖先信息标记和不同的机器学习分类器。为了提高品种识别的准确性，我们集成了三种不同的机器学习分类模型：逻辑回归、XGBoost和随机森林，每一种模型的准确率都达到了bb0 95%，仅用207个标记（品种信息标记（BIMs））就实现了bb1 98%的准确率。在此基础上，对奶牛和役用牛的品种信息标记以及奶牛和役用牛的选择标记进行了分类，并基于集成方法确定了17个役用信息标记（UIMs），其中包括12个BIMs和5个选择标记。基于效用（乳牛或生牛）的牛分类准确率为0.96%。为了演示UIMs的应用，使用这些标记来识别印度马哈拉施特拉邦的非描述牛的效用，发现这些牛中的许多都是draft purpose，并且与它们的生产性能一致。这一信息可进一步用于对奶牛或役畜进行等级划分的育种决策。在这里，我们开发了一个新的管道，利用[R-]参考面板，[a -]祖先信息标记，[S-]选择签名和[EL-]集成机器学习的能力来识别和分类牛，品种和用途，我们称之为RASEL（可在：https://github.com/kkokay07/RASEL）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biochemical Genetics 生物-生化与分子生物学

CiteScore

3.90

自引率

0.00%

发文量

133

审稿时长

4.8 months

期刊介绍： Biochemical Genetics welcomes original manuscripts that address and test clear scientific hypotheses, are directed to a broad scientific audience, and clearly contribute to the advancement of the field through the use of sound sampling or experimental design, reliable analytical methodologies and robust statistical analyses. Although studies focusing on particular regions and target organisms are welcome, it is not the journal’s goal to publish essentially descriptive studies that provide results with narrow applicability, or are based on very small samples or pseudoreplication. Rather, Biochemical Genetics welcomes review articles that go beyond summarizing previous publications and create added value through the systematic analysis and critique of the current state of knowledge or by conducting meta-analyses. Methodological articles are also within the scope of Biological Genetics, particularly when new laboratory techniques or computational approaches are fully described and thoroughly compared with the existing benchmark methods. Biochemical Genetics welcomes articles on the following topics: Genomics; Proteomics; Population genetics; Phylogenetics; Metagenomics; Microbial genetics; Genetics and evolution of wild and cultivated plants; Animal genetics and evolution; Human genetics and evolution; Genetic disorders; Genetic markers of diseases; Gene technology and therapy; Experimental and analytical methods; Statistical and computational methods.