Empirical analysis of predicting heart disease using diverse datasets and classification procedures of machine learning

IF 6 2区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY
Geetha Narasimhan, Akila Victor
{"title":"Empirical analysis of predicting heart disease using diverse datasets and classification procedures of machine learning","authors":"Geetha Narasimhan,&nbsp;Akila Victor","doi":"10.1016/j.asej.2025.103470","DOIUrl":null,"url":null,"abstract":"<div><div>Cardiovascular disease (CVD) poses a significant threat due to its complexity and fatality, necessitating early intervention. Fortunately, the rapidly evolving field of machine learning (ML) offers an array of algorithms for disease diagnosis and prediction. This research aims to develop and identify a model that assists radiologists in predicting heart disease, a two-phased approach. Phase 1: Feature Selection with SelectKBest: The first phase utilizes the SelectKBest method to select the most relevant features for prediction. This method combines individual feature rankings based on three statistical tests: chi-squared, mutual information, and F-statistic. The final selection is based on the overall rank obtained by each feature. Phase 2: Classification Algorithm Exploration: The second phase applies various classification algorithms, including Random Forest, k-nearest neighbors (KNN), decision tree (DT), support vector machine (SVM), Naïve Bayes (NB), Logistic Regression (LR), Random Forest Grid search, Gradient Boost, and Neural Network. The performance of these models is evaluated using three standard heart disease datasets: Cleveland, Faisalabad, and Framingham, retrieved from UCI and Kaggle. Each dataset undergoes pre-processing before applying the SelectKBest feature selection and all ML algorithms. Across all three datasets, Random Forest emerged as the champion, achieving accuracy rates of 90.16%, 90%, and 84%, respectively. Additionally, it demonstrated consistently lower classification errors compared to other algorithms. This research highlights the effectiveness of feature selection, particularly the SelectKBest filter-based method, in improving heart disease prediction accuracy using machine learning models like Random Forest. This paves the way for integrating such models into clinical settings, empowering radiologists with valuable decision-making tools for early CVD detection and intervention.</div></div>","PeriodicalId":48648,"journal":{"name":"Ain Shams Engineering Journal","volume":"16 8","pages":"Article 103470"},"PeriodicalIF":6.0000,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ain Shams Engineering Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2090447925002114","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Cardiovascular disease (CVD) poses a significant threat due to its complexity and fatality, necessitating early intervention. Fortunately, the rapidly evolving field of machine learning (ML) offers an array of algorithms for disease diagnosis and prediction. This research aims to develop and identify a model that assists radiologists in predicting heart disease, a two-phased approach. Phase 1: Feature Selection with SelectKBest: The first phase utilizes the SelectKBest method to select the most relevant features for prediction. This method combines individual feature rankings based on three statistical tests: chi-squared, mutual information, and F-statistic. The final selection is based on the overall rank obtained by each feature. Phase 2: Classification Algorithm Exploration: The second phase applies various classification algorithms, including Random Forest, k-nearest neighbors (KNN), decision tree (DT), support vector machine (SVM), Naïve Bayes (NB), Logistic Regression (LR), Random Forest Grid search, Gradient Boost, and Neural Network. The performance of these models is evaluated using three standard heart disease datasets: Cleveland, Faisalabad, and Framingham, retrieved from UCI and Kaggle. Each dataset undergoes pre-processing before applying the SelectKBest feature selection and all ML algorithms. Across all three datasets, Random Forest emerged as the champion, achieving accuracy rates of 90.16%, 90%, and 84%, respectively. Additionally, it demonstrated consistently lower classification errors compared to other algorithms. This research highlights the effectiveness of feature selection, particularly the SelectKBest filter-based method, in improving heart disease prediction accuracy using machine learning models like Random Forest. This paves the way for integrating such models into clinical settings, empowering radiologists with valuable decision-making tools for early CVD detection and intervention.
使用不同数据集和机器学习分类程序预测心脏病的实证分析
心血管疾病(CVD)因其复杂性和致死率而构成重大威胁,需要早期干预。幸运的是,快速发展的机器学习(ML)领域为疾病诊断和预测提供了一系列算法。这项研究旨在开发和确定一个模型,帮助放射科医生预测心脏病,这是一个两阶段的方法。阶段1:使用SelectKBest进行特征选择:第一阶段利用SelectKBest方法选择最相关的特征进行预测。该方法结合了基于三种统计检验的单个特征排名:卡方、互信息和f统计。最后的选择是基于每个特征获得的总体排名。第二阶段:分类算法探索:第二阶段应用各种分类算法,包括随机森林、k近邻(KNN)、决策树(DT)、支持向量机(SVM)、Naïve贝叶斯(NB)、逻辑回归(LR)、随机森林网格搜索、梯度增强和神经网络。这些模型的性能使用三个标准心脏病数据集进行评估:克利夫兰,费萨拉巴德和弗雷明汉,从UCI和Kaggle检索。在应用SelectKBest特征选择和所有ML算法之前,每个数据集都经过预处理。在所有三个数据集中,随机森林成为冠军,分别实现了90.16%,90%和84%的准确率。此外,与其他算法相比,它始终表现出较低的分类错误。这项研究强调了特征选择的有效性,特别是基于SelectKBest过滤器的方法,在使用随机森林等机器学习模型提高心脏病预测精度方面。这为将这些模型整合到临床环境中铺平了道路,为放射科医生提供了早期心血管疾病检测和干预的有价值的决策工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Ain Shams Engineering Journal
Ain Shams Engineering Journal Engineering-General Engineering
CiteScore
10.80
自引率
13.30%
发文量
441
审稿时长
49 weeks
期刊介绍: in Shams Engineering Journal is an international journal devoted to publication of peer reviewed original high-quality research papers and review papers in both traditional topics and those of emerging science and technology. Areas of both theoretical and fundamental interest as well as those concerning industrial applications, emerging instrumental techniques and those which have some practical application to an aspect of human endeavor, such as the preservation of the environment, health, waste disposal are welcome. The overall focus is on original and rigorous scientific research results which have generic significance. Ain Shams Engineering Journal focuses upon aspects of mechanical engineering, electrical engineering, civil engineering, chemical engineering, petroleum engineering, environmental engineering, architectural and urban planning engineering. Papers in which knowledge from other disciplines is integrated with engineering are especially welcome like nanotechnology, material sciences, and computational methods as well as applied basic sciences: engineering mathematics, physics and chemistry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信