K K Kanaka, Indrajit Ganguly, Sanjeev Singh, S V Kuralkar, Satpal Dixit, Nidhi Sukhija, Rangasai Chandra Goli
{"title":"RASEL: An Ensemble Model for Selection of Core SNPs and Its Application for Identification and Classification of Cattle Breeds.","authors":"K K Kanaka, Indrajit Ganguly, Sanjeev Singh, S V Kuralkar, Satpal Dixit, Nidhi Sukhija, Rangasai Chandra Goli","doi":"10.1007/s10528-025-11230-z","DOIUrl":null,"url":null,"abstract":"<p><p>Identifying and classifying different cattle populations as per their breed and utility holds immense practical importance in effective breeding management. For accurate identification and classification of cattle breeds, a reference panel of 10 breeds, 657 identified ancestry informative markers and different machine learning classifiers were employed. To boost the accuracy of breed identification, three distinct machine learning classification models: logistic regression, XGBoost, and random forest, each one having an accuracy of > 95%, were ensembled achieving an accuracy of > 98% with just 207 markers [breed informative markers (BIMs)]. Further, for classification of dairy and draft purpose cattle, the breed informative markers along with those in selection signatures specific to dairy and draft utility were explored, and 17 utility informative markers (UIMs) including 12 BIMs and 5 markers in selection signatures were identified based on an ensemble approach. The accuracy of classification of cattle based on the utility (dairy or draft) was > 96%. To demonstrate the application of UIMs, these markers were used to identify the utility of non-descript cattle of Maharashtra, India and found that many of these cattle were draft purpose and were aligning with their production performance. This information can further be used for taking breeding decisions for their grading up to dairy or draft cattle. Here, a novel pipeline which utilized [R-] reference panel, [A-] ancestry informative markers, [S-] selection signatures and the power of [EL-] ensemble machine learning for identifying and classifying the cattle, breed- and utility-wise, was developed, and we called it as RASEL (available at: https://github.com/kkokay07/RASEL ).</p>","PeriodicalId":482,"journal":{"name":"Biochemical Genetics","volume":" ","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biochemical Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s10528-025-11230-z","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Identifying and classifying different cattle populations as per their breed and utility holds immense practical importance in effective breeding management. For accurate identification and classification of cattle breeds, a reference panel of 10 breeds, 657 identified ancestry informative markers and different machine learning classifiers were employed. To boost the accuracy of breed identification, three distinct machine learning classification models: logistic regression, XGBoost, and random forest, each one having an accuracy of > 95%, were ensembled achieving an accuracy of > 98% with just 207 markers [breed informative markers (BIMs)]. Further, for classification of dairy and draft purpose cattle, the breed informative markers along with those in selection signatures specific to dairy and draft utility were explored, and 17 utility informative markers (UIMs) including 12 BIMs and 5 markers in selection signatures were identified based on an ensemble approach. The accuracy of classification of cattle based on the utility (dairy or draft) was > 96%. To demonstrate the application of UIMs, these markers were used to identify the utility of non-descript cattle of Maharashtra, India and found that many of these cattle were draft purpose and were aligning with their production performance. This information can further be used for taking breeding decisions for their grading up to dairy or draft cattle. Here, a novel pipeline which utilized [R-] reference panel, [A-] ancestry informative markers, [S-] selection signatures and the power of [EL-] ensemble machine learning for identifying and classifying the cattle, breed- and utility-wise, was developed, and we called it as RASEL (available at: https://github.com/kkokay07/RASEL ).
期刊介绍:
Biochemical Genetics welcomes original manuscripts that address and test clear scientific hypotheses, are directed to a broad scientific audience, and clearly contribute to the advancement of the field through the use of sound sampling or experimental design, reliable analytical methodologies and robust statistical analyses.
Although studies focusing on particular regions and target organisms are welcome, it is not the journal’s goal to publish essentially descriptive studies that provide results with narrow applicability, or are based on very small samples or pseudoreplication.
Rather, Biochemical Genetics welcomes review articles that go beyond summarizing previous publications and create added value through the systematic analysis and critique of the current state of knowledge or by conducting meta-analyses.
Methodological articles are also within the scope of Biological Genetics, particularly when new laboratory techniques or computational approaches are fully described and thoroughly compared with the existing benchmark methods.
Biochemical Genetics welcomes articles on the following topics: Genomics; Proteomics; Population genetics; Phylogenetics; Metagenomics; Microbial genetics; Genetics and evolution of wild and cultivated plants; Animal genetics and evolution; Human genetics and evolution; Genetic disorders; Genetic markers of diseases; Gene technology and therapy; Experimental and analytical methods; Statistical and computational methods.