Comparison of Different Machine Learning Algorithms for the Prediction of Coronary Artery Disease

数据分析和信息处理(英文) Pub Date : 2020-04-09 DOI:10.4236/jdaip.2020.82003

Imran Chowdhury Dipto, Tanzila Islam, H. Rahman, A. F. M. Moshiur Rahman

{"title":"Comparison of Different Machine Learning Algorithms for the Prediction of Coronary Artery Disease","authors":"Imran Chowdhury Dipto, Tanzila Islam, H. Rahman, A. F. M. Moshiur Rahman","doi":"10.4236/jdaip.2020.82003","DOIUrl":null,"url":null,"abstract":"Coronary Artery Disease (CAD) is the leading cause of mortality worldwide. It is a complex heart disease that is associated with numerous risk factors and a variety of Symptoms. During the past decade, Coronary Artery Disease (CAD) has undergone a remarkable evolution. The purpose of this research is to build a prototype system using different Machine Learning Algorithms (models) and compare their performance to identify a suitable model. This paper explores three most commonly used Machine Learning Algorithms named as Logistic Regression, Support Vector Machine and Artificial Neural Network. To conduct this research, a clinical dataset has been used. To evaluate the performance, different evaluation methods have been used such as Confusion Matrix, Stratified K-fold Cross Validation, Accuracy, AUC and ROC. To validate the results, the accuracy and AUC scores have been validated using the K-Fold Cross-validation technique. The dataset contains class imbalance, so the SMOTE Algorithm has been used to balance the dataset and the performance analysis has been carried out on both sets of data. The results show that accuracy scores of all the models have been increased while training the balanced dataset. Overall, Artificial Neural Network has the highest accuracy whereas Logistic Regression has the least accurate among the trained Algorithms.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"数据分析和信息处理(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.4236/jdaip.2020.82003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Coronary Artery Disease (CAD) is the leading cause of mortality worldwide. It is a complex heart disease that is associated with numerous risk factors and a variety of Symptoms. During the past decade, Coronary Artery Disease (CAD) has undergone a remarkable evolution. The purpose of this research is to build a prototype system using different Machine Learning Algorithms (models) and compare their performance to identify a suitable model. This paper explores three most commonly used Machine Learning Algorithms named as Logistic Regression, Support Vector Machine and Artificial Neural Network. To conduct this research, a clinical dataset has been used. To evaluate the performance, different evaluation methods have been used such as Confusion Matrix, Stratified K-fold Cross Validation, Accuracy, AUC and ROC. To validate the results, the accuracy and AUC scores have been validated using the K-Fold Cross-validation technique. The dataset contains class imbalance, so the SMOTE Algorithm has been used to balance the dataset and the performance analysis has been carried out on both sets of data. The results show that accuracy scores of all the models have been increased while training the balanced dataset. Overall, Artificial Neural Network has the highest accuracy whereas Logistic Regression has the least accurate among the trained Algorithms.

查看原文本刊更多论文

不同机器学习算法在冠状动脉疾病预测中的比较

冠状动脉疾病(CAD)是世界范围内导致死亡的主要原因。它是一种复杂的心脏病，与许多危险因素和各种症状有关。在过去的十年中，冠状动脉疾病(CAD)经历了显著的发展。本研究的目的是使用不同的机器学习算法(模型)构建一个原型系统，并比较它们的性能以确定合适的模型。本文探讨了三种最常用的机器学习算法:逻辑回归、支持向量机和人工神经网络。为了进行这项研究，我们使用了一个临床数据集。为了评估其性能，使用了不同的评估方法，如混淆矩阵、分层K-fold交叉验证、准确性、AUC和ROC。为了验证结果，使用K-Fold交叉验证技术验证了准确性和AUC分数。由于数据集存在类不平衡，因此采用SMOTE算法对数据集进行平衡，并对两组数据进行性能分析。结果表明，在平衡数据集的训练过程中，所有模型的准确率分数都有所提高。总的来说，人工神经网络的准确率最高，而逻辑回归的准确率最低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

数据分析和信息处理(英文)

自引率

0.00%

发文量