K. Reddy, I. Elamvazuthi, A. A. Aziz, S. Paramasivam, Hui Na Chua, S. Pranavanand
{"title":"提高心血管疾病风险预测精度的轮作森林集合分类器","authors":"K. Reddy, I. Elamvazuthi, A. A. Aziz, S. Paramasivam, Hui Na Chua, S. Pranavanand","doi":"10.1109/NICS54270.2021.9701455","DOIUrl":null,"url":null,"abstract":"Heart disease risk prediction is very important as it is one of the primary causes of sudden death in the world. Early-stage prediction can save the lives by undergoing appropriate diagnosis steps or making necessary changes in their lifestyles. Recent studies have focused on the use of data mining and machine learning in the detection of diseases based on specific features of a person. The Rotation Forest, a tree-based ensemble classifier that uses Principal Component Analysis for feature extraction, is proposed to improve the prediction accuracy of heart disease risk. The Statlog heart dataset has been selected from the publicly available UCI machine learning repository in this research work. The dataset was trained with a Rotation Forest ensemble classifier with default base classifier J48, and then, Random Forest on full features and selected features obtained from One Rule and Support Vector Machines attribute evaluators. The performance of the Rotation Forest was compared with the standard machine learning classifiers, Naïve Bayes, Logistic Regression, Support Vector Machines, K-Nearest Neighbors, AdaBoostM1, and Bagging. The Rotation Forest algorithm with Random Forest provided the highest accuracy of 94.44% and area under the ROC curve 0.980 on selected features of the Statlog dataset from the One Rule method.","PeriodicalId":296963,"journal":{"name":"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Rotation Forest Ensemble Classifier to Improve the Cardiovascular Disease Risk Prediction Accuracy\",\"authors\":\"K. Reddy, I. Elamvazuthi, A. A. Aziz, S. Paramasivam, Hui Na Chua, S. Pranavanand\",\"doi\":\"10.1109/NICS54270.2021.9701455\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heart disease risk prediction is very important as it is one of the primary causes of sudden death in the world. Early-stage prediction can save the lives by undergoing appropriate diagnosis steps or making necessary changes in their lifestyles. Recent studies have focused on the use of data mining and machine learning in the detection of diseases based on specific features of a person. The Rotation Forest, a tree-based ensemble classifier that uses Principal Component Analysis for feature extraction, is proposed to improve the prediction accuracy of heart disease risk. The Statlog heart dataset has been selected from the publicly available UCI machine learning repository in this research work. The dataset was trained with a Rotation Forest ensemble classifier with default base classifier J48, and then, Random Forest on full features and selected features obtained from One Rule and Support Vector Machines attribute evaluators. The performance of the Rotation Forest was compared with the standard machine learning classifiers, Naïve Bayes, Logistic Regression, Support Vector Machines, K-Nearest Neighbors, AdaBoostM1, and Bagging. The Rotation Forest algorithm with Random Forest provided the highest accuracy of 94.44% and area under the ROC curve 0.980 on selected features of the Statlog dataset from the One Rule method.\",\"PeriodicalId\":296963,\"journal\":{\"name\":\"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NICS54270.2021.9701455\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NICS54270.2021.9701455","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Rotation Forest Ensemble Classifier to Improve the Cardiovascular Disease Risk Prediction Accuracy
Heart disease risk prediction is very important as it is one of the primary causes of sudden death in the world. Early-stage prediction can save the lives by undergoing appropriate diagnosis steps or making necessary changes in their lifestyles. Recent studies have focused on the use of data mining and machine learning in the detection of diseases based on specific features of a person. The Rotation Forest, a tree-based ensemble classifier that uses Principal Component Analysis for feature extraction, is proposed to improve the prediction accuracy of heart disease risk. The Statlog heart dataset has been selected from the publicly available UCI machine learning repository in this research work. The dataset was trained with a Rotation Forest ensemble classifier with default base classifier J48, and then, Random Forest on full features and selected features obtained from One Rule and Support Vector Machines attribute evaluators. The performance of the Rotation Forest was compared with the standard machine learning classifiers, Naïve Bayes, Logistic Regression, Support Vector Machines, K-Nearest Neighbors, AdaBoostM1, and Bagging. The Rotation Forest algorithm with Random Forest provided the highest accuracy of 94.44% and area under the ROC curve 0.980 on selected features of the Statlog dataset from the One Rule method.