Assessing Feature Selection Techniques for Machine Learning Models using Cardiac Dataset

Shital Patil, Surendra Bhosale
{"title":"Assessing Feature Selection Techniques for Machine Learning Models using Cardiac Dataset","authors":"Shital Patil, Surendra Bhosale","doi":"10.1109/AIKE55402.2022.00027","DOIUrl":null,"url":null,"abstract":"Cardiac disorders are the leading causes of morbidity and mortality in the world, accounting for a large number of deaths over the last few decades, and have emerged as the most life-threatening disease globally. Machine learning and Artificial intelligence have been playing key role in predicting the heart diseases. A relevant set of feature can be very helpful in predicting the disease accurately. In this study, we proposed a comparative analysis of 4 different features selection methods and evaluated their performance with both raw (Unbalanced dataset) and sampled (Balanced) dataset. The publicly available Z-Alizadeh Sani dataset have been used for this study. Four different feature selection techniques: Data Analysis, minimum Redundancy maximum Relevance (mRMR), and Recursive Feature Elimination (RFE) are used in this study. These methods are tested with 8 different classification models to get the best accuracy possible. Using balanced and unbalanced dataset, the study shows promising results in terms of various performance metrics in accurately predicting heart disease. Experimental results obtained by the proposed method with the raw data obtains maximum AUC of 100%, maximum F1 score of 94%, maximum SENS of 98%, maximum precision (PREC) of 93%. While with the balanced dataset obtained results are, maximum AUC of 100%, F1-score 95%, maximum SENS of 95%, maximum PREC of 97%.","PeriodicalId":441077,"journal":{"name":"2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIKE55402.2022.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Cardiac disorders are the leading causes of morbidity and mortality in the world, accounting for a large number of deaths over the last few decades, and have emerged as the most life-threatening disease globally. Machine learning and Artificial intelligence have been playing key role in predicting the heart diseases. A relevant set of feature can be very helpful in predicting the disease accurately. In this study, we proposed a comparative analysis of 4 different features selection methods and evaluated their performance with both raw (Unbalanced dataset) and sampled (Balanced) dataset. The publicly available Z-Alizadeh Sani dataset have been used for this study. Four different feature selection techniques: Data Analysis, minimum Redundancy maximum Relevance (mRMR), and Recursive Feature Elimination (RFE) are used in this study. These methods are tested with 8 different classification models to get the best accuracy possible. Using balanced and unbalanced dataset, the study shows promising results in terms of various performance metrics in accurately predicting heart disease. Experimental results obtained by the proposed method with the raw data obtains maximum AUC of 100%, maximum F1 score of 94%, maximum SENS of 98%, maximum precision (PREC) of 93%. While with the balanced dataset obtained results are, maximum AUC of 100%, F1-score 95%, maximum SENS of 95%, maximum PREC of 97%.
使用心脏数据集评估机器学习模型的特征选择技术
心脏疾病是世界上发病率和死亡率的主要原因,在过去几十年中造成大量死亡,并已成为全球最危及生命的疾病。机器学习和人工智能在预测心脏病方面一直发挥着关键作用。一组相关的特征对准确预测疾病非常有帮助。在本研究中,我们提出了4种不同的特征选择方法的对比分析,并评估了它们在原始(不平衡数据集)和采样(平衡)数据集上的性能。这项研究使用了公开可用的Z-Alizadeh Sani数据集。本研究使用了四种不同的特征选择技术:数据分析、最小冗余最大相关性(mRMR)和递归特征消除(RFE)。这些方法在8种不同的分类模型上进行了测试,以获得尽可能高的准确率。使用平衡和不平衡数据集,该研究在准确预测心脏病的各种性能指标方面显示出有希望的结果。采用该方法对原始数据进行的实验结果表明,最大AUC为100%,最大F1分数为94%,最大SENS为98%,最大精度(PREC)为93%。平衡数据集得到的结果是,最大AUC为100%,f1评分为95%,最大SENS为95%,最大PREC为97%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信