{"title":"基于混沌博弈表示特征的未知基因组分类器","authors":"Vrinda V. Nair, A. Nair","doi":"10.1145/1722024.1722065","DOIUrl":null,"url":null,"abstract":"Classification of unknown genomes finds wide application in areas like evolutionary studies, bio-diversity researches and forensic studies which are viewed in a renewed 'genomic' perspective, lately. Only a few attempts are seen in literature focusing on unknown genome identification, and the reported accuracies are not more than 85%. Most works report classification into the major kingdoms only, not venturing further into their sub-classes. A novel combined technique of Chaos Game Representation (CGR) and machine learning is proposed, the former for feature extraction and the latter for subsequent sequence classification. Eight sub categories of eukaryotic mitochondrial genomes from NCBI are used for the study. The sequences are initially mapped into their Chaos Game Representation format. Genomic feature extraction is implemented by computing the Frequency Chaos Game Representation (FCGR) matrix. An order 3 FCGR matrix is considered here, which consists of 64 elements. The 64 element matrix acts as the feature descriptor for classification. The classification methods used are Difference Boosting Naïve Bayesian (DBNB) based method, Artificial Neural Network (ANN) based and Support Vector Machine (SVM) based methods. Accuracies of individual methods are reported. Although the average accuracy is seen highest for the SVM-CGR combination, better accuracies are seen for some categories in other methods too. Hence a voting classifier is implemented combining all the three methods. Accuracies of 100% were obtained for Vertebrata and Porifera whereas Acoelomata, Cnidaria and Fungi were classified with accuracies above 90%. The accuracies obtained for Protostomia, Plant, and Pseudocoelomata were respectively 90, 82 and 77%.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"1 1","pages":"35"},"PeriodicalIF":0.0000,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722065","citationCount":"9","resultStr":"{\"title\":\"Combined classifier for unknown genome classification using chaos game representation features\",\"authors\":\"Vrinda V. Nair, A. Nair\",\"doi\":\"10.1145/1722024.1722065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classification of unknown genomes finds wide application in areas like evolutionary studies, bio-diversity researches and forensic studies which are viewed in a renewed 'genomic' perspective, lately. Only a few attempts are seen in literature focusing on unknown genome identification, and the reported accuracies are not more than 85%. Most works report classification into the major kingdoms only, not venturing further into their sub-classes. A novel combined technique of Chaos Game Representation (CGR) and machine learning is proposed, the former for feature extraction and the latter for subsequent sequence classification. Eight sub categories of eukaryotic mitochondrial genomes from NCBI are used for the study. The sequences are initially mapped into their Chaos Game Representation format. Genomic feature extraction is implemented by computing the Frequency Chaos Game Representation (FCGR) matrix. An order 3 FCGR matrix is considered here, which consists of 64 elements. The 64 element matrix acts as the feature descriptor for classification. The classification methods used are Difference Boosting Naïve Bayesian (DBNB) based method, Artificial Neural Network (ANN) based and Support Vector Machine (SVM) based methods. Accuracies of individual methods are reported. Although the average accuracy is seen highest for the SVM-CGR combination, better accuracies are seen for some categories in other methods too. Hence a voting classifier is implemented combining all the three methods. Accuracies of 100% were obtained for Vertebrata and Porifera whereas Acoelomata, Cnidaria and Fungi were classified with accuracies above 90%. The accuracies obtained for Protostomia, Plant, and Pseudocoelomata were respectively 90, 82 and 77%.\",\"PeriodicalId\":39379,\"journal\":{\"name\":\"In Silico Biology\",\"volume\":\"1 1\",\"pages\":\"35\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1145/1722024.1722065\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"In Silico Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1722024.1722065\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"In Silico Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1722024.1722065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
Combined classifier for unknown genome classification using chaos game representation features
Classification of unknown genomes finds wide application in areas like evolutionary studies, bio-diversity researches and forensic studies which are viewed in a renewed 'genomic' perspective, lately. Only a few attempts are seen in literature focusing on unknown genome identification, and the reported accuracies are not more than 85%. Most works report classification into the major kingdoms only, not venturing further into their sub-classes. A novel combined technique of Chaos Game Representation (CGR) and machine learning is proposed, the former for feature extraction and the latter for subsequent sequence classification. Eight sub categories of eukaryotic mitochondrial genomes from NCBI are used for the study. The sequences are initially mapped into their Chaos Game Representation format. Genomic feature extraction is implemented by computing the Frequency Chaos Game Representation (FCGR) matrix. An order 3 FCGR matrix is considered here, which consists of 64 elements. The 64 element matrix acts as the feature descriptor for classification. The classification methods used are Difference Boosting Naïve Bayesian (DBNB) based method, Artificial Neural Network (ANN) based and Support Vector Machine (SVM) based methods. Accuracies of individual methods are reported. Although the average accuracy is seen highest for the SVM-CGR combination, better accuracies are seen for some categories in other methods too. Hence a voting classifier is implemented combining all the three methods. Accuracies of 100% were obtained for Vertebrata and Porifera whereas Acoelomata, Cnidaria and Fungi were classified with accuracies above 90%. The accuracies obtained for Protostomia, Plant, and Pseudocoelomata were respectively 90, 82 and 77%.
In Silico BiologyComputer Science-Computational Theory and Mathematics
CiteScore
2.20
自引率
0.00%
发文量
1
期刊介绍:
The considerable "algorithmic complexity" of biological systems requires a huge amount of detailed information for their complete description. Although far from being complete, the overwhelming quantity of small pieces of information gathered for all kind of biological systems at the molecular and cellular level requires computational tools to be adequately stored and interpreted. Interpretation of data means to abstract them as much as allowed to provide a systematic, an integrative view of biology. Most of the presently available scientific journals focus either on accumulating more data from elaborate experimental approaches, or on presenting new algorithms for the interpretation of these data. Both approaches are meritorious.