基于树状特征的集成学习微阵列癌症数据分类

IF 1.2 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Guesh Dagnew, B.H. Shekar
{"title":"基于树状特征的集成学习微阵列癌症数据分类","authors":"Guesh Dagnew,&nbsp;B.H. Shekar","doi":"10.1049/ccs2.12003","DOIUrl":null,"url":null,"abstract":"<p>Cancer is a group of related diseases with high mortality rate characterized by abnormal cell growth which attacks the body tissues. Microarray cancer data is a prominent research topic across many disciplines focused to address problems related to the higher curse of dimensionality, a small number of samples, noisy data and imbalance class. A random forest (RF) tree-based feature selection and ensemble learning based on hard voting and soft voting is proposed to classify microarray cancer data using six different base classifiers. The selected features due to RF tree are submitted to the base classifiers as the training set. Then, an ensemble learning method is applied to the base classifiers in which case each base classifier predicts class label individually. The final prediction is carried out hard and soft voting techniques that use majority voting and weighted probability on the test set. The proposed ensemble learning method is validated on eight different standard microarray cancer datasets, of which three of the datasets are binary class and the remaining five datasets are multi-class datasets. Experimental results of the proposed method show 1.00 classification accuracy on six of the datasets and 0.96 on two of the datasets.</p>","PeriodicalId":33652,"journal":{"name":"Cognitive Computation and Systems","volume":null,"pages":null},"PeriodicalIF":1.2000,"publicationDate":"2021-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ccs2.12003","citationCount":"12","resultStr":"{\"title\":\"Ensemble learning-based classification of microarray cancer data on tree-based features\",\"authors\":\"Guesh Dagnew,&nbsp;B.H. Shekar\",\"doi\":\"10.1049/ccs2.12003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Cancer is a group of related diseases with high mortality rate characterized by abnormal cell growth which attacks the body tissues. Microarray cancer data is a prominent research topic across many disciplines focused to address problems related to the higher curse of dimensionality, a small number of samples, noisy data and imbalance class. A random forest (RF) tree-based feature selection and ensemble learning based on hard voting and soft voting is proposed to classify microarray cancer data using six different base classifiers. The selected features due to RF tree are submitted to the base classifiers as the training set. Then, an ensemble learning method is applied to the base classifiers in which case each base classifier predicts class label individually. The final prediction is carried out hard and soft voting techniques that use majority voting and weighted probability on the test set. The proposed ensemble learning method is validated on eight different standard microarray cancer datasets, of which three of the datasets are binary class and the remaining five datasets are multi-class datasets. Experimental results of the proposed method show 1.00 classification accuracy on six of the datasets and 0.96 on two of the datasets.</p>\",\"PeriodicalId\":33652,\"journal\":{\"name\":\"Cognitive Computation and Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2021-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ccs2.12003\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognitive Computation and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/ccs2.12003\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Computation and Systems","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ccs2.12003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 12

摘要

癌症是一类以细胞生长异常为特征,以攻击机体组织为特征的高死亡率的相关疾病。微阵列癌症数据是一个跨多个学科的突出研究课题,致力于解决与维数高、样本数量少、噪声数据和不平衡类相关的问题。提出了一种基于随机森林(RF)树的特征选择和基于硬投票和软投票的集成学习方法,使用六种不同的基分类器对微阵列癌症数据进行分类。通过RF树选择的特征作为训练集提交给基分类器。然后,将集成学习方法应用于基分类器,每个基分类器单独预测类标签。最终的预测采用硬投票和软投票技术,分别对测试集使用多数投票和加权概率。在8个不同的标准微阵列癌症数据集上验证了所提出的集成学习方法,其中3个数据集为二分类数据集,其余5个数据集为多分类数据集。实验结果表明,该方法在6个数据集上的分类准确率为1.00,在2个数据集上的分类准确率为0.96。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Ensemble learning-based classification of microarray cancer data on tree-based features

Ensemble learning-based classification of microarray cancer data on tree-based features

Cancer is a group of related diseases with high mortality rate characterized by abnormal cell growth which attacks the body tissues. Microarray cancer data is a prominent research topic across many disciplines focused to address problems related to the higher curse of dimensionality, a small number of samples, noisy data and imbalance class. A random forest (RF) tree-based feature selection and ensemble learning based on hard voting and soft voting is proposed to classify microarray cancer data using six different base classifiers. The selected features due to RF tree are submitted to the base classifiers as the training set. Then, an ensemble learning method is applied to the base classifiers in which case each base classifier predicts class label individually. The final prediction is carried out hard and soft voting techniques that use majority voting and weighted probability on the test set. The proposed ensemble learning method is validated on eight different standard microarray cancer datasets, of which three of the datasets are binary class and the remaining five datasets are multi-class datasets. Experimental results of the proposed method show 1.00 classification accuracy on six of the datasets and 0.96 on two of the datasets.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Cognitive Computation and Systems
Cognitive Computation and Systems Computer Science-Computer Science Applications
CiteScore
2.50
自引率
0.00%
发文量
39
审稿时长
10 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信