Heart Disease Classification Based on Hybrid Ensemble Stacking Technique

IJCI. International Journal of Computers and Information Pub Date : 2021-12-01 DOI:10.21608/ijci.2021.207732

Ahmed El sheikh, Nader Mahmoud, A. Keshk

{"title":"Heart Disease Classification Based on Hybrid Ensemble Stacking Technique","authors":"Ahmed El sheikh, Nader Mahmoud, A. Keshk","doi":"10.21608/ijci.2021.207732","DOIUrl":null,"url":null,"abstract":"Heart diseases are considered one of the leading death rates for humanity in the recent decades. The early diagnosis and prediction of heart disease becomes a critical subject in medical domain. Data mining techniques are usually used for finding anomalies, patterns and correlations within large data sets, thus it's crucial for clinical data analysis and various disease prediction. Ensemble approaches have proven to be quite effective in solving a variety of classification problems. In this research, we propose a hybrid ensemble stacking model with different feature engineering algorithms. The proposed ensemble model is based on five base models: Random Forest, Decision Tree, K-Nearest Neighbour (KNN), Support Vector Machine (SVM), and Naïve Bayes for heart disease diagnosis. Logistic Regression meta model is used to merge base models predictions. We have examined various feature selection approaches such as: Brute Force, Principal Component Analysis (PCA), Classification and Regression Tree (CART) Feature Importance, and Logistic Regression based Recursive Feature Elimination. The proposed approach has been experimentally validated and evaluated on different dataset : UCI Cleveland and UCI Statlog. A quantitative evaluation shows that the combination of the ensemble model with brute force as feature selection technique yields a top accuracy of 97.8% for heart disease classification. the proposed stacking model has proven it's efficiency and overcomes existing approaches in heart diseases classification Keywords—Heart Disease; Data Mining; Classification; Ensemble Learning; Stacking; Feature Selection.","PeriodicalId":137729,"journal":{"name":"IJCI. International Journal of Computers and Information","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IJCI. International Journal of Computers and Information","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21608/ijci.2021.207732","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Heart diseases are considered one of the leading death rates for humanity in the recent decades. The early diagnosis and prediction of heart disease becomes a critical subject in medical domain. Data mining techniques are usually used for finding anomalies, patterns and correlations within large data sets, thus it's crucial for clinical data analysis and various disease prediction. Ensemble approaches have proven to be quite effective in solving a variety of classification problems. In this research, we propose a hybrid ensemble stacking model with different feature engineering algorithms. The proposed ensemble model is based on five base models: Random Forest, Decision Tree, K-Nearest Neighbour (KNN), Support Vector Machine (SVM), and Naïve Bayes for heart disease diagnosis. Logistic Regression meta model is used to merge base models predictions. We have examined various feature selection approaches such as: Brute Force, Principal Component Analysis (PCA), Classification and Regression Tree (CART) Feature Importance, and Logistic Regression based Recursive Feature Elimination. The proposed approach has been experimentally validated and evaluated on different dataset : UCI Cleveland and UCI Statlog. A quantitative evaluation shows that the combination of the ensemble model with brute force as feature selection technique yields a top accuracy of 97.8% for heart disease classification. the proposed stacking model has proven it's efficiency and overcomes existing approaches in heart diseases classification Keywords—Heart Disease; Data Mining; Classification; Ensemble Learning; Stacking; Feature Selection.

查看原文本刊更多论文

基于混合集成叠加技术的心脏病分类

心脏病被认为是近几十年来人类死亡率最高的疾病之一。心脏病的早期诊断和预测已成为医学领域的重要课题。数据挖掘技术通常用于在大型数据集中发现异常、模式和相关性，因此对临床数据分析和各种疾病预测至关重要。集成方法已被证明在解决各种分类问题方面非常有效。在这项研究中，我们提出了一种混合集成叠加模型与不同的特征工程算法。该集成模型基于五个基本模型:随机森林、决策树、k近邻(KNN)、支持向量机(SVM)和Naïve贝叶斯心脏病诊断。逻辑回归元模型用于合并基本模型预测。我们研究了各种特征选择方法，如:蛮力，主成分分析(PCA)，分类和回归树(CART)特征重要性，以及基于逻辑回归的递归特征消除。所提出的方法已经在不同的数据集上进行了实验验证和评估:UCI Cleveland和UCI Statlog。定量评价表明，将集成模型与蛮力作为特征选择技术相结合，对心脏病分类的最高准确率为97.8%。所提出的叠加模型克服了现有的心脏病分类方法的缺点，证明了其有效性。数据挖掘;分类;整体学习;叠加;特征选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IJCI. International Journal of Computers and Information

自引率

0.00%

发文量