{"title":"Heart Disease Classification Based on Hybrid Ensemble Stacking Technique","authors":"Ahmed El sheikh, Nader Mahmoud, A. Keshk","doi":"10.21608/ijci.2021.207732","DOIUrl":null,"url":null,"abstract":"Heart diseases are considered one of the leading death rates for humanity in the recent decades. The early diagnosis and prediction of heart disease becomes a critical subject in medical domain. Data mining techniques are usually used for finding anomalies, patterns and correlations within large data sets, thus it's crucial for clinical data analysis and various disease prediction. Ensemble approaches have proven to be quite effective in solving a variety of classification problems. In this research, we propose a hybrid ensemble stacking model with different feature engineering algorithms. The proposed ensemble model is based on five base models: Random Forest, Decision Tree, K-Nearest Neighbour (KNN), Support Vector Machine (SVM), and Naïve Bayes for heart disease diagnosis. Logistic Regression meta model is used to merge base models predictions. We have examined various feature selection approaches such as: Brute Force, Principal Component Analysis (PCA), Classification and Regression Tree (CART) Feature Importance, and Logistic Regression based Recursive Feature Elimination. The proposed approach has been experimentally validated and evaluated on different dataset : UCI Cleveland and UCI Statlog. A quantitative evaluation shows that the combination of the ensemble model with brute force as feature selection technique yields a top accuracy of 97.8% for heart disease classification. the proposed stacking model has proven it's efficiency and overcomes existing approaches in heart diseases classification Keywords—Heart Disease; Data Mining; Classification; Ensemble Learning; Stacking; Feature Selection.","PeriodicalId":137729,"journal":{"name":"IJCI. International Journal of Computers and Information","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IJCI. International Journal of Computers and Information","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21608/ijci.2021.207732","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Heart diseases are considered one of the leading death rates for humanity in the recent decades. The early diagnosis and prediction of heart disease becomes a critical subject in medical domain. Data mining techniques are usually used for finding anomalies, patterns and correlations within large data sets, thus it's crucial for clinical data analysis and various disease prediction. Ensemble approaches have proven to be quite effective in solving a variety of classification problems. In this research, we propose a hybrid ensemble stacking model with different feature engineering algorithms. The proposed ensemble model is based on five base models: Random Forest, Decision Tree, K-Nearest Neighbour (KNN), Support Vector Machine (SVM), and Naïve Bayes for heart disease diagnosis. Logistic Regression meta model is used to merge base models predictions. We have examined various feature selection approaches such as: Brute Force, Principal Component Analysis (PCA), Classification and Regression Tree (CART) Feature Importance, and Logistic Regression based Recursive Feature Elimination. The proposed approach has been experimentally validated and evaluated on different dataset : UCI Cleveland and UCI Statlog. A quantitative evaluation shows that the combination of the ensemble model with brute force as feature selection technique yields a top accuracy of 97.8% for heart disease classification. the proposed stacking model has proven it's efficiency and overcomes existing approaches in heart diseases classification Keywords—Heart Disease; Data Mining; Classification; Ensemble Learning; Stacking; Feature Selection.