Assessing Feature Selection Techniques for Machine Learning Models using Cardiac Dataset

2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) Pub Date : 2022-09-01 DOI:10.1109/AIKE55402.2022.00027

Shital Patil, Surendra Bhosale

{"title":"Assessing Feature Selection Techniques for Machine Learning Models using Cardiac Dataset","authors":"Shital Patil, Surendra Bhosale","doi":"10.1109/AIKE55402.2022.00027","DOIUrl":null,"url":null,"abstract":"Cardiac disorders are the leading causes of morbidity and mortality in the world, accounting for a large number of deaths over the last few decades, and have emerged as the most life-threatening disease globally. Machine learning and Artificial intelligence have been playing key role in predicting the heart diseases. A relevant set of feature can be very helpful in predicting the disease accurately. In this study, we proposed a comparative analysis of 4 different features selection methods and evaluated their performance with both raw (Unbalanced dataset) and sampled (Balanced) dataset. The publicly available Z-Alizadeh Sani dataset have been used for this study. Four different feature selection techniques: Data Analysis, minimum Redundancy maximum Relevance (mRMR), and Recursive Feature Elimination (RFE) are used in this study. These methods are tested with 8 different classification models to get the best accuracy possible. Using balanced and unbalanced dataset, the study shows promising results in terms of various performance metrics in accurately predicting heart disease. Experimental results obtained by the proposed method with the raw data obtains maximum AUC of 100%, maximum F1 score of 94%, maximum SENS of 98%, maximum precision (PREC) of 93%. While with the balanced dataset obtained results are, maximum AUC of 100%, F1-score 95%, maximum SENS of 95%, maximum PREC of 97%.","PeriodicalId":441077,"journal":{"name":"2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIKE55402.2022.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Cardiac disorders are the leading causes of morbidity and mortality in the world, accounting for a large number of deaths over the last few decades, and have emerged as the most life-threatening disease globally. Machine learning and Artificial intelligence have been playing key role in predicting the heart diseases. A relevant set of feature can be very helpful in predicting the disease accurately. In this study, we proposed a comparative analysis of 4 different features selection methods and evaluated their performance with both raw (Unbalanced dataset) and sampled (Balanced) dataset. The publicly available Z-Alizadeh Sani dataset have been used for this study. Four different feature selection techniques: Data Analysis, minimum Redundancy maximum Relevance (mRMR), and Recursive Feature Elimination (RFE) are used in this study. These methods are tested with 8 different classification models to get the best accuracy possible. Using balanced and unbalanced dataset, the study shows promising results in terms of various performance metrics in accurately predicting heart disease. Experimental results obtained by the proposed method with the raw data obtains maximum AUC of 100%, maximum F1 score of 94%, maximum SENS of 98%, maximum precision (PREC) of 93%. While with the balanced dataset obtained results are, maximum AUC of 100%, F1-score 95%, maximum SENS of 95%, maximum PREC of 97%.

查看原文本刊更多论文

使用心脏数据集评估机器学习模型的特征选择技术

心脏疾病是世界上发病率和死亡率的主要原因，在过去几十年中造成大量死亡，并已成为全球最危及生命的疾病。机器学习和人工智能在预测心脏病方面一直发挥着关键作用。一组相关的特征对准确预测疾病非常有帮助。在本研究中，我们提出了4种不同的特征选择方法的对比分析，并评估了它们在原始(不平衡数据集)和采样(平衡)数据集上的性能。这项研究使用了公开可用的Z-Alizadeh Sani数据集。本研究使用了四种不同的特征选择技术:数据分析、最小冗余最大相关性(mRMR)和递归特征消除(RFE)。这些方法在8种不同的分类模型上进行了测试，以获得尽可能高的准确率。使用平衡和不平衡数据集，该研究在准确预测心脏病的各种性能指标方面显示出有希望的结果。采用该方法对原始数据进行的实验结果表明，最大AUC为100%，最大F1分数为94%，最大SENS为98%，最大精度(PREC)为93%。平衡数据集得到的结果是，最大AUC为100%，f1评分为95%，最大SENS为95%，最大PREC为97%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)

自引率

0.00%

发文量