Influence of Feature Selection Methods on Breast Cancer Early Prediction Phase using Classification and Regression Tree

2022 International Conference on Engineering & MIS (ICEMIS) Pub Date : 2022-07-04 DOI:10.1109/ICEMIS56295.2022.9914078

Asma Agaal, Mansour Essgaer

{"title":"Influence of Feature Selection Methods on Breast Cancer Early Prediction Phase using Classification and Regression Tree","authors":"Asma Agaal, Mansour Essgaer","doi":"10.1109/ICEMIS56295.2022.9914078","DOIUrl":null,"url":null,"abstract":"In recent years, healthcare data has been growing exponentially. The major challenge is to predict and analyze all this data effectively. Feature selection is a solution in which a subset of informative features is selected from a high-dimensional dataset. Feature selection helps to increase accuracy and remove irrelevant features. In the medical domain, selecting important features for healthcare is essential as it directly affects human health. Several filters, wrapper, and embedded feature selection techniques will be examined in this study including generic univariate selects, select percentile, select k best, Pearson correlation coefficient, mutual information, relief-f, recursive feature elimination, recursive feature elimination with cross-validation, sequential forward selection, sequential backward selection, and select-from-model. The aim is to make the healthcare predictions model named classification and regression tree more accurate by employing feature selection methods, to accurately detect breast cancer in its early stages, where the data is collected from Sebha oncology center in the south of Libya. The performance of the classification and regression tree was seen to be noticeably enhanced when eliminated irrelevant features. Later, our model outperforms other classification methods, namely: logistic regression, naive Bayes, and K-nearest neighbors, by using the optimal subset of features identified by recursive feature elimination.","PeriodicalId":191284,"journal":{"name":"2022 International Conference on Engineering & MIS (ICEMIS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Engineering & MIS (ICEMIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEMIS56295.2022.9914078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In recent years, healthcare data has been growing exponentially. The major challenge is to predict and analyze all this data effectively. Feature selection is a solution in which a subset of informative features is selected from a high-dimensional dataset. Feature selection helps to increase accuracy and remove irrelevant features. In the medical domain, selecting important features for healthcare is essential as it directly affects human health. Several filters, wrapper, and embedded feature selection techniques will be examined in this study including generic univariate selects, select percentile, select k best, Pearson correlation coefficient, mutual information, relief-f, recursive feature elimination, recursive feature elimination with cross-validation, sequential forward selection, sequential backward selection, and select-from-model. The aim is to make the healthcare predictions model named classification and regression tree more accurate by employing feature selection methods, to accurately detect breast cancer in its early stages, where the data is collected from Sebha oncology center in the south of Libya. The performance of the classification and regression tree was seen to be noticeably enhanced when eliminated irrelevant features. Later, our model outperforms other classification methods, namely: logistic regression, naive Bayes, and K-nearest neighbors, by using the optimal subset of features identified by recursive feature elimination.

查看原文本刊更多论文

基于分类回归树的特征选择方法对乳腺癌早期预测阶段的影响

近年来，医疗保健数据呈指数级增长。主要的挑战是有效地预测和分析所有这些数据。特征选择是一种从高维数据集中选择信息特征子集的解决方案。特征选择有助于提高准确性和删除不相关的特征。在医疗领域，选择医疗保健的重要特征是必不可少的，因为它直接影响到人类的健康。本研究将研究几种过滤器、包装器和嵌入式特征选择技术，包括通用单变量选择、选择百分位数、选择k最佳、Pearson相关系数、互信息、缓解-f、递归特征消除、递归特征消除与交叉验证、顺序前向选择、顺序后向选择和从模型中选择。目的是利用特征选择方法使分类回归树的医疗预测模型更加准确，以准确检测早期乳腺癌，其中数据收集于利比亚南部的Sebha肿瘤中心。当去除不相关的特征时，分类和回归树的性能明显增强。后来，我们的模型通过使用递归特征消去识别的最优特征子集，优于其他分类方法，即逻辑回归、朴素贝叶斯和k近邻。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 International Conference on Engineering & MIS (ICEMIS)

自引率

0.00%

发文量