Influence of Feature Selection Methods on Breast Cancer Early Prediction Phase using Classification and Regression Tree

Asma Agaal, Mansour Essgaer
{"title":"Influence of Feature Selection Methods on Breast Cancer Early Prediction Phase using Classification and Regression Tree","authors":"Asma Agaal, Mansour Essgaer","doi":"10.1109/ICEMIS56295.2022.9914078","DOIUrl":null,"url":null,"abstract":"In recent years, healthcare data has been growing exponentially. The major challenge is to predict and analyze all this data effectively. Feature selection is a solution in which a subset of informative features is selected from a high-dimensional dataset. Feature selection helps to increase accuracy and remove irrelevant features. In the medical domain, selecting important features for healthcare is essential as it directly affects human health. Several filters, wrapper, and embedded feature selection techniques will be examined in this study including generic univariate selects, select percentile, select k best, Pearson correlation coefficient, mutual information, relief-f, recursive feature elimination, recursive feature elimination with cross-validation, sequential forward selection, sequential backward selection, and select-from-model. The aim is to make the healthcare predictions model named classification and regression tree more accurate by employing feature selection methods, to accurately detect breast cancer in its early stages, where the data is collected from Sebha oncology center in the south of Libya. The performance of the classification and regression tree was seen to be noticeably enhanced when eliminated irrelevant features. Later, our model outperforms other classification methods, namely: logistic regression, naive Bayes, and K-nearest neighbors, by using the optimal subset of features identified by recursive feature elimination.","PeriodicalId":191284,"journal":{"name":"2022 International Conference on Engineering & MIS (ICEMIS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Engineering & MIS (ICEMIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEMIS56295.2022.9914078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In recent years, healthcare data has been growing exponentially. The major challenge is to predict and analyze all this data effectively. Feature selection is a solution in which a subset of informative features is selected from a high-dimensional dataset. Feature selection helps to increase accuracy and remove irrelevant features. In the medical domain, selecting important features for healthcare is essential as it directly affects human health. Several filters, wrapper, and embedded feature selection techniques will be examined in this study including generic univariate selects, select percentile, select k best, Pearson correlation coefficient, mutual information, relief-f, recursive feature elimination, recursive feature elimination with cross-validation, sequential forward selection, sequential backward selection, and select-from-model. The aim is to make the healthcare predictions model named classification and regression tree more accurate by employing feature selection methods, to accurately detect breast cancer in its early stages, where the data is collected from Sebha oncology center in the south of Libya. The performance of the classification and regression tree was seen to be noticeably enhanced when eliminated irrelevant features. Later, our model outperforms other classification methods, namely: logistic regression, naive Bayes, and K-nearest neighbors, by using the optimal subset of features identified by recursive feature elimination.
基于分类回归树的特征选择方法对乳腺癌早期预测阶段的影响
近年来,医疗保健数据呈指数级增长。主要的挑战是有效地预测和分析所有这些数据。特征选择是一种从高维数据集中选择信息特征子集的解决方案。特征选择有助于提高准确性和删除不相关的特征。在医疗领域,选择医疗保健的重要特征是必不可少的,因为它直接影响到人类的健康。本研究将研究几种过滤器、包装器和嵌入式特征选择技术,包括通用单变量选择、选择百分位数、选择k最佳、Pearson相关系数、互信息、缓解-f、递归特征消除、递归特征消除与交叉验证、顺序前向选择、顺序后向选择和从模型中选择。目的是利用特征选择方法使分类回归树的医疗预测模型更加准确,以准确检测早期乳腺癌,其中数据收集于利比亚南部的Sebha肿瘤中心。当去除不相关的特征时,分类和回归树的性能明显增强。后来,我们的模型通过使用递归特征消去识别的最优特征子集,优于其他分类方法,即逻辑回归、朴素贝叶斯和k近邻。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信