Comparison of Different Machine Learning Algorithms for the Prediction of Coronary Artery Disease

Imran Chowdhury Dipto, Tanzila Islam, H. Rahman, A. F. M. Moshiur Rahman
{"title":"Comparison of Different Machine Learning Algorithms for the Prediction of Coronary Artery Disease","authors":"Imran Chowdhury Dipto, Tanzila Islam, H. Rahman, A. F. M. Moshiur Rahman","doi":"10.4236/jdaip.2020.82003","DOIUrl":null,"url":null,"abstract":"Coronary Artery Disease (CAD) is the leading cause of mortality worldwide. It is a complex heart disease that is associated with numerous risk factors and a variety of Symptoms. During the past decade, Coronary Artery Disease (CAD) has undergone a remarkable evolution. The purpose of this research is to build a prototype system using different Machine Learning Algorithms (models) and compare their performance to identify a suitable model. This paper explores three most commonly used Machine Learning Algorithms named as Logistic Regression, Support Vector Machine and Artificial Neural Network. To conduct this research, a clinical dataset has been used. To evaluate the performance, different evaluation methods have been used such as Confusion Matrix, Stratified K-fold Cross Validation, Accuracy, AUC and ROC. To validate the results, the accuracy and AUC scores have been validated using the K-Fold Cross-validation technique. The dataset contains class imbalance, so the SMOTE Algorithm has been used to balance the dataset and the performance analysis has been carried out on both sets of data. The results show that accuracy scores of all the models have been increased while training the balanced dataset. Overall, Artificial Neural Network has the highest accuracy whereas Logistic Regression has the least accurate among the trained Algorithms.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"数据分析和信息处理(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.4236/jdaip.2020.82003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Coronary Artery Disease (CAD) is the leading cause of mortality worldwide. It is a complex heart disease that is associated with numerous risk factors and a variety of Symptoms. During the past decade, Coronary Artery Disease (CAD) has undergone a remarkable evolution. The purpose of this research is to build a prototype system using different Machine Learning Algorithms (models) and compare their performance to identify a suitable model. This paper explores three most commonly used Machine Learning Algorithms named as Logistic Regression, Support Vector Machine and Artificial Neural Network. To conduct this research, a clinical dataset has been used. To evaluate the performance, different evaluation methods have been used such as Confusion Matrix, Stratified K-fold Cross Validation, Accuracy, AUC and ROC. To validate the results, the accuracy and AUC scores have been validated using the K-Fold Cross-validation technique. The dataset contains class imbalance, so the SMOTE Algorithm has been used to balance the dataset and the performance analysis has been carried out on both sets of data. The results show that accuracy scores of all the models have been increased while training the balanced dataset. Overall, Artificial Neural Network has the highest accuracy whereas Logistic Regression has the least accurate among the trained Algorithms.
不同机器学习算法在冠状动脉疾病预测中的比较
冠状动脉疾病(CAD)是世界范围内导致死亡的主要原因。它是一种复杂的心脏病,与许多危险因素和各种症状有关。在过去的十年中,冠状动脉疾病(CAD)经历了显著的发展。本研究的目的是使用不同的机器学习算法(模型)构建一个原型系统,并比较它们的性能以确定合适的模型。本文探讨了三种最常用的机器学习算法:逻辑回归、支持向量机和人工神经网络。为了进行这项研究,我们使用了一个临床数据集。为了评估其性能,使用了不同的评估方法,如混淆矩阵、分层K-fold交叉验证、准确性、AUC和ROC。为了验证结果,使用K-Fold交叉验证技术验证了准确性和AUC分数。由于数据集存在类不平衡,因此采用SMOTE算法对数据集进行平衡,并对两组数据进行性能分析。结果表明,在平衡数据集的训练过程中,所有模型的准确率分数都有所提高。总的来说,人工神经网络的准确率最高,而逻辑回归的准确率最低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
91
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信