Ensemble Case based Reasoning Imputation in Breast Cancer Classification

IF 0.5 4区 计算机科学 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS
Imane Chlioui, A. Idri, Ibtissam Abnane, M. Ezzat
{"title":"Ensemble Case based Reasoning Imputation in Breast Cancer Classification","authors":"Imane Chlioui, A. Idri, Ibtissam Abnane, M. Ezzat","doi":"10.6688/JISE.202109_37(5).0004","DOIUrl":null,"url":null,"abstract":"Missing Data (MD) is a common drawback that affects breast cancer classification. Thus, handling missing data is primordial before building any breast cancer classifier. This paper presents the impact of using ensemble Case-Based Reasoning (CBR) imputation on breast cancer classification. Thereafter, we evaluated the influence of CBR using parameter tuning and ensemble CBR (E-CBR) with three missingness mechanisms (MCAR: missing completely at random, MAR: missing at random and NMAR: not missing at random) and nine percentages (10% to 90%) on the accuracy rates of five classifiers: Decision trees, Random forest, K-nearest neighbor, Support vector machine and Multi-layer perceptron over two Wisconsin breast cancer datasets. All experiments were implemented using Weka JAVA API code 3.8; SPSS v20 was used for statistical tests. The findings confirmed that E-CBR yields to better results compared to CBR for the five classifiers. The MD percentage affects negatively the classifier performance: as the MD percentage increases, the accuracy rates of the classifier decrease regardless the MD mechanism and technique. RF with E-CBR outperformed all the other combinations (MD technique, classifier) with 89.72% for MCAR, 87.08% for MAR and 86.84% for NMAR.","PeriodicalId":50177,"journal":{"name":"Journal of Information Science and Engineering","volume":"27 1","pages":"1039-1051"},"PeriodicalIF":0.5000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.6688/JISE.202109_37(5).0004","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 3

Abstract

Missing Data (MD) is a common drawback that affects breast cancer classification. Thus, handling missing data is primordial before building any breast cancer classifier. This paper presents the impact of using ensemble Case-Based Reasoning (CBR) imputation on breast cancer classification. Thereafter, we evaluated the influence of CBR using parameter tuning and ensemble CBR (E-CBR) with three missingness mechanisms (MCAR: missing completely at random, MAR: missing at random and NMAR: not missing at random) and nine percentages (10% to 90%) on the accuracy rates of five classifiers: Decision trees, Random forest, K-nearest neighbor, Support vector machine and Multi-layer perceptron over two Wisconsin breast cancer datasets. All experiments were implemented using Weka JAVA API code 3.8; SPSS v20 was used for statistical tests. The findings confirmed that E-CBR yields to better results compared to CBR for the five classifiers. The MD percentage affects negatively the classifier performance: as the MD percentage increases, the accuracy rates of the classifier decrease regardless the MD mechanism and technique. RF with E-CBR outperformed all the other combinations (MD technique, classifier) with 89.72% for MCAR, 87.08% for MAR and 86.84% for NMAR.
基于集成案例的推理归算在乳腺癌分类中的应用
缺失数据(MD)是影响乳腺癌分类的一个常见缺陷。因此,在建立任何乳腺癌分类器之前,处理缺失的数据是原始的。本文介绍了集成案例推理(CBR)方法在乳腺癌分类中的应用。之后,我们使用参数调整和集成CBR (E-CBR)评估了CBR的影响,CBR具有三种缺失机制(MCAR:完全随机缺失,MAR:随机缺失和NMAR:不随机缺失)和9个百分比(10%至90%)对五个分类器的准确率的影响:决策树,随机森林,k近邻,支持向量机和多层感知器在两个威斯康星州乳腺癌数据集上。所有实验均使用Weka JAVA API代码3.8实现;采用SPSS v20进行统计检验。研究结果证实,与CBR相比,5种分类器的E-CBR产生更好的结果。MD百分比对分类器性能有负面影响:随着MD百分比的增加,无论MD机制和技术如何,分类器的准确率都会下降。射频联合E-CBR优于其他组合(MD技术、分类器),MCAR、MAR和NMAR的准确率分别为89.72%、87.08%和86.84%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Information Science and Engineering
Journal of Information Science and Engineering 工程技术-计算机:信息系统
CiteScore
2.00
自引率
0.00%
发文量
4
审稿时长
8 months
期刊介绍: The Journal of Information Science and Engineering is dedicated to the dissemination of information on computer science, computer engineering, and computer systems. This journal encourages articles on original research in the areas of computer hardware, software, man-machine interface, theory and applications. tutorial papers in the above-mentioned areas, and state-of-the-art papers on various aspects of computer systems and applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信