心血管疾病预测的集成硬投票模型

Al-Zadid Sultan Bin Habib, Tanpia Tasnim
{"title":"心血管疾病预测的集成硬投票模型","authors":"Al-Zadid Sultan Bin Habib, Tanpia Tasnim","doi":"10.1109/STI50764.2020.9350514","DOIUrl":null,"url":null,"abstract":"With the evolution of trending technologies, health informatics has played a vital role in making our day-to-day lives more comfortable. The availability of enough medical data and computational tools has made medical informatics possible to take a long step towards the next level of Healthcare Industry 4.0. Information engineering or emerging technologies can be applied to identify chronic diseases like heart failure to lessen the mortality rate. Machine Learning (ML) based approaches are gaining popularity for predicting these diseases in the 4th generation healthcare industry. In this paper, several risk factors, e.g., age, sex, total cholesterol level, number of cigarettes smoked per day, glucose level, and systolic blood pressure, have been considered input features for causing heart disease next ten years. The Hard Voting (HV) classifier has been formed with Logistic Regression (LogReg), Random Forest (RF), Multilayer Perceptron (MLP), and Gaussian Naïve Bayes (GNB) classifiers. RobustScaler was applied to scale the input attributes’ values, and the dataset was balanced using Random Undersampling. The HV classifier is the satisfactory performance provider with 88.42% test accuracy along with precision, recall, F1, and Area Under Curve (AUC) scores of 1, 0.043, 0.082, and 0.73 correspondingly. The results have also been compared using some other parameters, e.g., the Receiver Operating Characteristics (ROC) curves, learning curves, precision-recall curve, confusion matrix, Logarithmic Loss (Log Loss), Brier Score Loss (BSL), Mathews Correlation Coefficient (MCC), Mean Absolute Error (MAE), and Mean Squared Error (MSE) to bolster the claim.","PeriodicalId":242439,"journal":{"name":"2020 2nd International Conference on Sustainable Technologies for Industry 4.0 (STI)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"An Ensemble Hard Voting Model for Cardiovascular Disease Prediction\",\"authors\":\"Al-Zadid Sultan Bin Habib, Tanpia Tasnim\",\"doi\":\"10.1109/STI50764.2020.9350514\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the evolution of trending technologies, health informatics has played a vital role in making our day-to-day lives more comfortable. The availability of enough medical data and computational tools has made medical informatics possible to take a long step towards the next level of Healthcare Industry 4.0. Information engineering or emerging technologies can be applied to identify chronic diseases like heart failure to lessen the mortality rate. Machine Learning (ML) based approaches are gaining popularity for predicting these diseases in the 4th generation healthcare industry. In this paper, several risk factors, e.g., age, sex, total cholesterol level, number of cigarettes smoked per day, glucose level, and systolic blood pressure, have been considered input features for causing heart disease next ten years. The Hard Voting (HV) classifier has been formed with Logistic Regression (LogReg), Random Forest (RF), Multilayer Perceptron (MLP), and Gaussian Naïve Bayes (GNB) classifiers. RobustScaler was applied to scale the input attributes’ values, and the dataset was balanced using Random Undersampling. The HV classifier is the satisfactory performance provider with 88.42% test accuracy along with precision, recall, F1, and Area Under Curve (AUC) scores of 1, 0.043, 0.082, and 0.73 correspondingly. The results have also been compared using some other parameters, e.g., the Receiver Operating Characteristics (ROC) curves, learning curves, precision-recall curve, confusion matrix, Logarithmic Loss (Log Loss), Brier Score Loss (BSL), Mathews Correlation Coefficient (MCC), Mean Absolute Error (MAE), and Mean Squared Error (MSE) to bolster the claim.\",\"PeriodicalId\":242439,\"journal\":{\"name\":\"2020 2nd International Conference on Sustainable Technologies for Industry 4.0 (STI)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 2nd International Conference on Sustainable Technologies for Industry 4.0 (STI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/STI50764.2020.9350514\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd International Conference on Sustainable Technologies for Industry 4.0 (STI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/STI50764.2020.9350514","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

随着趋势技术的发展,健康信息学在使我们的日常生活更加舒适方面发挥了至关重要的作用。足够的医疗数据和计算工具的可用性使得医疗信息学有可能朝着医疗保健工业4.0的下一个水平迈出一大步。信息工程或新兴技术可以应用于识别慢性疾病,如心力衰竭,以减少死亡率。在第四代医疗保健行业中,基于机器学习(ML)的方法在预测这些疾病方面越来越受欢迎。在本文中,几个危险因素,如年龄、性别、总胆固醇水平、每天吸烟的数量、葡萄糖水平和收缩压,被认为是未来十年导致心脏病的输入特征。硬投票(HV)分类器由逻辑回归(loggreg)、随机森林(RF)、多层感知器(MLP)和高斯Naïve贝叶斯(GNB)分类器组成。使用RobustScaler对输入属性值进行缩放,并使用Random Undersampling对数据集进行平衡。HV分类器的测试准确率为88.42%,精密度、召回率、F1和曲线下面积(AUC)得分分别为1、0.043、0.082和0.73。结果还与其他一些参数进行了比较,例如,受试者工作特征(ROC)曲线、学习曲线、精确召回率曲线、混淆矩阵、对数损失(Log Loss)、Brier分数损失(BSL)、马修斯相关系数(MCC)、平均绝对误差(MAE)和均方误差(MSE),以支持该主张。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Ensemble Hard Voting Model for Cardiovascular Disease Prediction
With the evolution of trending technologies, health informatics has played a vital role in making our day-to-day lives more comfortable. The availability of enough medical data and computational tools has made medical informatics possible to take a long step towards the next level of Healthcare Industry 4.0. Information engineering or emerging technologies can be applied to identify chronic diseases like heart failure to lessen the mortality rate. Machine Learning (ML) based approaches are gaining popularity for predicting these diseases in the 4th generation healthcare industry. In this paper, several risk factors, e.g., age, sex, total cholesterol level, number of cigarettes smoked per day, glucose level, and systolic blood pressure, have been considered input features for causing heart disease next ten years. The Hard Voting (HV) classifier has been formed with Logistic Regression (LogReg), Random Forest (RF), Multilayer Perceptron (MLP), and Gaussian Naïve Bayes (GNB) classifiers. RobustScaler was applied to scale the input attributes’ values, and the dataset was balanced using Random Undersampling. The HV classifier is the satisfactory performance provider with 88.42% test accuracy along with precision, recall, F1, and Area Under Curve (AUC) scores of 1, 0.043, 0.082, and 0.73 correspondingly. The results have also been compared using some other parameters, e.g., the Receiver Operating Characteristics (ROC) curves, learning curves, precision-recall curve, confusion matrix, Logarithmic Loss (Log Loss), Brier Score Loss (BSL), Mathews Correlation Coefficient (MCC), Mean Absolute Error (MAE), and Mean Squared Error (MSE) to bolster the claim.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信