{"title":"使用不同数据集和机器学习分类程序预测心脏病的实证分析","authors":"Geetha Narasimhan, Akila Victor","doi":"10.1016/j.asej.2025.103470","DOIUrl":null,"url":null,"abstract":"<div><div>Cardiovascular disease (CVD) poses a significant threat due to its complexity and fatality, necessitating early intervention. Fortunately, the rapidly evolving field of machine learning (ML) offers an array of algorithms for disease diagnosis and prediction. This research aims to develop and identify a model that assists radiologists in predicting heart disease, a two-phased approach. Phase 1: Feature Selection with SelectKBest: The first phase utilizes the SelectKBest method to select the most relevant features for prediction. This method combines individual feature rankings based on three statistical tests: chi-squared, mutual information, and F-statistic. The final selection is based on the overall rank obtained by each feature. Phase 2: Classification Algorithm Exploration: The second phase applies various classification algorithms, including Random Forest, k-nearest neighbors (KNN), decision tree (DT), support vector machine (SVM), Naïve Bayes (NB), Logistic Regression (LR), Random Forest Grid search, Gradient Boost, and Neural Network. The performance of these models is evaluated using three standard heart disease datasets: Cleveland, Faisalabad, and Framingham, retrieved from UCI and Kaggle. Each dataset undergoes pre-processing before applying the SelectKBest feature selection and all ML algorithms. Across all three datasets, Random Forest emerged as the champion, achieving accuracy rates of 90.16%, 90%, and 84%, respectively. Additionally, it demonstrated consistently lower classification errors compared to other algorithms. This research highlights the effectiveness of feature selection, particularly the SelectKBest filter-based method, in improving heart disease prediction accuracy using machine learning models like Random Forest. This paves the way for integrating such models into clinical settings, empowering radiologists with valuable decision-making tools for early CVD detection and intervention.</div></div>","PeriodicalId":48648,"journal":{"name":"Ain Shams Engineering Journal","volume":"16 8","pages":"Article 103470"},"PeriodicalIF":6.0000,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Empirical analysis of predicting heart disease using diverse datasets and classification procedures of machine learning\",\"authors\":\"Geetha Narasimhan, Akila Victor\",\"doi\":\"10.1016/j.asej.2025.103470\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Cardiovascular disease (CVD) poses a significant threat due to its complexity and fatality, necessitating early intervention. Fortunately, the rapidly evolving field of machine learning (ML) offers an array of algorithms for disease diagnosis and prediction. This research aims to develop and identify a model that assists radiologists in predicting heart disease, a two-phased approach. Phase 1: Feature Selection with SelectKBest: The first phase utilizes the SelectKBest method to select the most relevant features for prediction. This method combines individual feature rankings based on three statistical tests: chi-squared, mutual information, and F-statistic. The final selection is based on the overall rank obtained by each feature. Phase 2: Classification Algorithm Exploration: The second phase applies various classification algorithms, including Random Forest, k-nearest neighbors (KNN), decision tree (DT), support vector machine (SVM), Naïve Bayes (NB), Logistic Regression (LR), Random Forest Grid search, Gradient Boost, and Neural Network. The performance of these models is evaluated using three standard heart disease datasets: Cleveland, Faisalabad, and Framingham, retrieved from UCI and Kaggle. Each dataset undergoes pre-processing before applying the SelectKBest feature selection and all ML algorithms. Across all three datasets, Random Forest emerged as the champion, achieving accuracy rates of 90.16%, 90%, and 84%, respectively. Additionally, it demonstrated consistently lower classification errors compared to other algorithms. This research highlights the effectiveness of feature selection, particularly the SelectKBest filter-based method, in improving heart disease prediction accuracy using machine learning models like Random Forest. This paves the way for integrating such models into clinical settings, empowering radiologists with valuable decision-making tools for early CVD detection and intervention.</div></div>\",\"PeriodicalId\":48648,\"journal\":{\"name\":\"Ain Shams Engineering Journal\",\"volume\":\"16 8\",\"pages\":\"Article 103470\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ain Shams Engineering Journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2090447925002114\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ain Shams Engineering Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2090447925002114","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
Empirical analysis of predicting heart disease using diverse datasets and classification procedures of machine learning
Cardiovascular disease (CVD) poses a significant threat due to its complexity and fatality, necessitating early intervention. Fortunately, the rapidly evolving field of machine learning (ML) offers an array of algorithms for disease diagnosis and prediction. This research aims to develop and identify a model that assists radiologists in predicting heart disease, a two-phased approach. Phase 1: Feature Selection with SelectKBest: The first phase utilizes the SelectKBest method to select the most relevant features for prediction. This method combines individual feature rankings based on three statistical tests: chi-squared, mutual information, and F-statistic. The final selection is based on the overall rank obtained by each feature. Phase 2: Classification Algorithm Exploration: The second phase applies various classification algorithms, including Random Forest, k-nearest neighbors (KNN), decision tree (DT), support vector machine (SVM), Naïve Bayes (NB), Logistic Regression (LR), Random Forest Grid search, Gradient Boost, and Neural Network. The performance of these models is evaluated using three standard heart disease datasets: Cleveland, Faisalabad, and Framingham, retrieved from UCI and Kaggle. Each dataset undergoes pre-processing before applying the SelectKBest feature selection and all ML algorithms. Across all three datasets, Random Forest emerged as the champion, achieving accuracy rates of 90.16%, 90%, and 84%, respectively. Additionally, it demonstrated consistently lower classification errors compared to other algorithms. This research highlights the effectiveness of feature selection, particularly the SelectKBest filter-based method, in improving heart disease prediction accuracy using machine learning models like Random Forest. This paves the way for integrating such models into clinical settings, empowering radiologists with valuable decision-making tools for early CVD detection and intervention.
期刊介绍:
in Shams Engineering Journal is an international journal devoted to publication of peer reviewed original high-quality research papers and review papers in both traditional topics and those of emerging science and technology. Areas of both theoretical and fundamental interest as well as those concerning industrial applications, emerging instrumental techniques and those which have some practical application to an aspect of human endeavor, such as the preservation of the environment, health, waste disposal are welcome. The overall focus is on original and rigorous scientific research results which have generic significance.
Ain Shams Engineering Journal focuses upon aspects of mechanical engineering, electrical engineering, civil engineering, chemical engineering, petroleum engineering, environmental engineering, architectural and urban planning engineering. Papers in which knowledge from other disciplines is integrated with engineering are especially welcome like nanotechnology, material sciences, and computational methods as well as applied basic sciences: engineering mathematics, physics and chemistry.