提高心血管疾病风险预测精度的轮作森林集合分类器

2021 8th NAFOSTED Conference on Information and Computer Science (NICS) Pub Date : 2021-12-21 DOI:10.1109/NICS54270.2021.9701455

K. Reddy, I. Elamvazuthi, A. A. Aziz, S. Paramasivam, Hui Na Chua, S. Pranavanand

{"title":"提高心血管疾病风险预测精度的轮作森林集合分类器","authors":"K. Reddy, I. Elamvazuthi, A. A. Aziz, S. Paramasivam, Hui Na Chua, S. Pranavanand","doi":"10.1109/NICS54270.2021.9701455","DOIUrl":null,"url":null,"abstract":"Heart disease risk prediction is very important as it is one of the primary causes of sudden death in the world. Early-stage prediction can save the lives by undergoing appropriate diagnosis steps or making necessary changes in their lifestyles. Recent studies have focused on the use of data mining and machine learning in the detection of diseases based on specific features of a person. The Rotation Forest, a tree-based ensemble classifier that uses Principal Component Analysis for feature extraction, is proposed to improve the prediction accuracy of heart disease risk. The Statlog heart dataset has been selected from the publicly available UCI machine learning repository in this research work. The dataset was trained with a Rotation Forest ensemble classifier with default base classifier J48, and then, Random Forest on full features and selected features obtained from One Rule and Support Vector Machines attribute evaluators. The performance of the Rotation Forest was compared with the standard machine learning classifiers, Naïve Bayes, Logistic Regression, Support Vector Machines, K-Nearest Neighbors, AdaBoostM1, and Bagging. The Rotation Forest algorithm with Random Forest provided the highest accuracy of 94.44% and area under the ROC curve 0.980 on selected features of the Statlog dataset from the One Rule method.","PeriodicalId":296963,"journal":{"name":"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Rotation Forest Ensemble Classifier to Improve the Cardiovascular Disease Risk Prediction Accuracy\",\"authors\":\"K. Reddy, I. Elamvazuthi, A. A. Aziz, S. Paramasivam, Hui Na Chua, S. Pranavanand\",\"doi\":\"10.1109/NICS54270.2021.9701455\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heart disease risk prediction is very important as it is one of the primary causes of sudden death in the world. Early-stage prediction can save the lives by undergoing appropriate diagnosis steps or making necessary changes in their lifestyles. Recent studies have focused on the use of data mining and machine learning in the detection of diseases based on specific features of a person. The Rotation Forest, a tree-based ensemble classifier that uses Principal Component Analysis for feature extraction, is proposed to improve the prediction accuracy of heart disease risk. The Statlog heart dataset has been selected from the publicly available UCI machine learning repository in this research work. The dataset was trained with a Rotation Forest ensemble classifier with default base classifier J48, and then, Random Forest on full features and selected features obtained from One Rule and Support Vector Machines attribute evaluators. The performance of the Rotation Forest was compared with the standard machine learning classifiers, Naïve Bayes, Logistic Regression, Support Vector Machines, K-Nearest Neighbors, AdaBoostM1, and Bagging. The Rotation Forest algorithm with Random Forest provided the highest accuracy of 94.44% and area under the ROC curve 0.980 on selected features of the Statlog dataset from the One Rule method.\",\"PeriodicalId\":296963,\"journal\":{\"name\":\"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NICS54270.2021.9701455\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NICS54270.2021.9701455","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

心脏病是世界上猝死的主要原因之一，其风险预测具有十分重要的意义。早期预测可以通过采取适当的诊断步骤或对生活方式进行必要的改变来挽救生命。最近的研究集中在使用数据挖掘和机器学习来检测基于人的特定特征的疾病。为了提高心脏病风险的预测精度，提出了一种基于树的集成分类器——旋转森林，该分类器利用主成分分析进行特征提取。Statlog心脏数据集是从公开可用的UCI机器学习存储库中选择的。使用默认基分类器J48的旋转森林集成分类器对数据集进行训练，然后对从一个规则和支持向量机属性评估器中获得的完整特征和选择特征进行随机森林训练。将旋转森林的性能与标准机器学习分类器Naïve贝叶斯、逻辑回归、支持向量机、k近邻、AdaBoostM1和Bagging进行比较。结合Random Forest的Rotation Forest算法对One Rule方法的Statlog数据集所选特征的准确率最高，为94.44%，ROC曲线下面积为0.980。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Rotation Forest Ensemble Classifier to Improve the Cardiovascular Disease Risk Prediction Accuracy

Heart disease risk prediction is very important as it is one of the primary causes of sudden death in the world. Early-stage prediction can save the lives by undergoing appropriate diagnosis steps or making necessary changes in their lifestyles. Recent studies have focused on the use of data mining and machine learning in the detection of diseases based on specific features of a person. The Rotation Forest, a tree-based ensemble classifier that uses Principal Component Analysis for feature extraction, is proposed to improve the prediction accuracy of heart disease risk. The Statlog heart dataset has been selected from the publicly available UCI machine learning repository in this research work. The dataset was trained with a Rotation Forest ensemble classifier with default base classifier J48, and then, Random Forest on full features and selected features obtained from One Rule and Support Vector Machines attribute evaluators. The performance of the Rotation Forest was compared with the standard machine learning classifiers, Naïve Bayes, Logistic Regression, Support Vector Machines, K-Nearest Neighbors, AdaBoostM1, and Bagging. The Rotation Forest algorithm with Random Forest provided the highest accuracy of 94.44% and area under the ROC curve 0.980 on selected features of the Statlog dataset from the One Rule method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 8th NAFOSTED Conference on Information and Computer Science (NICS)

自引率

0.00%

发文量