{"title":"基于树状特征的集成学习微阵列癌症数据分类","authors":"Guesh Dagnew, B.H. Shekar","doi":"10.1049/ccs2.12003","DOIUrl":null,"url":null,"abstract":"<p>Cancer is a group of related diseases with high mortality rate characterized by abnormal cell growth which attacks the body tissues. Microarray cancer data is a prominent research topic across many disciplines focused to address problems related to the higher curse of dimensionality, a small number of samples, noisy data and imbalance class. A random forest (RF) tree-based feature selection and ensemble learning based on hard voting and soft voting is proposed to classify microarray cancer data using six different base classifiers. The selected features due to RF tree are submitted to the base classifiers as the training set. Then, an ensemble learning method is applied to the base classifiers in which case each base classifier predicts class label individually. The final prediction is carried out hard and soft voting techniques that use majority voting and weighted probability on the test set. The proposed ensemble learning method is validated on eight different standard microarray cancer datasets, of which three of the datasets are binary class and the remaining five datasets are multi-class datasets. Experimental results of the proposed method show 1.00 classification accuracy on six of the datasets and 0.96 on two of the datasets.</p>","PeriodicalId":33652,"journal":{"name":"Cognitive Computation and Systems","volume":null,"pages":null},"PeriodicalIF":1.2000,"publicationDate":"2021-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ccs2.12003","citationCount":"12","resultStr":"{\"title\":\"Ensemble learning-based classification of microarray cancer data on tree-based features\",\"authors\":\"Guesh Dagnew, B.H. Shekar\",\"doi\":\"10.1049/ccs2.12003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Cancer is a group of related diseases with high mortality rate characterized by abnormal cell growth which attacks the body tissues. Microarray cancer data is a prominent research topic across many disciplines focused to address problems related to the higher curse of dimensionality, a small number of samples, noisy data and imbalance class. A random forest (RF) tree-based feature selection and ensemble learning based on hard voting and soft voting is proposed to classify microarray cancer data using six different base classifiers. The selected features due to RF tree are submitted to the base classifiers as the training set. Then, an ensemble learning method is applied to the base classifiers in which case each base classifier predicts class label individually. The final prediction is carried out hard and soft voting techniques that use majority voting and weighted probability on the test set. The proposed ensemble learning method is validated on eight different standard microarray cancer datasets, of which three of the datasets are binary class and the remaining five datasets are multi-class datasets. Experimental results of the proposed method show 1.00 classification accuracy on six of the datasets and 0.96 on two of the datasets.</p>\",\"PeriodicalId\":33652,\"journal\":{\"name\":\"Cognitive Computation and Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2021-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ccs2.12003\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognitive Computation and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/ccs2.12003\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Computation and Systems","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ccs2.12003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Ensemble learning-based classification of microarray cancer data on tree-based features
Cancer is a group of related diseases with high mortality rate characterized by abnormal cell growth which attacks the body tissues. Microarray cancer data is a prominent research topic across many disciplines focused to address problems related to the higher curse of dimensionality, a small number of samples, noisy data and imbalance class. A random forest (RF) tree-based feature selection and ensemble learning based on hard voting and soft voting is proposed to classify microarray cancer data using six different base classifiers. The selected features due to RF tree are submitted to the base classifiers as the training set. Then, an ensemble learning method is applied to the base classifiers in which case each base classifier predicts class label individually. The final prediction is carried out hard and soft voting techniques that use majority voting and weighted probability on the test set. The proposed ensemble learning method is validated on eight different standard microarray cancer datasets, of which three of the datasets are binary class and the remaining five datasets are multi-class datasets. Experimental results of the proposed method show 1.00 classification accuracy on six of the datasets and 0.96 on two of the datasets.