决策树算法在心脏病预测中的应用

M. Mia, Anis Fitri Nur Masruriyah, Adi Rizky Pratama
{"title":"决策树算法在心脏病预测中的应用","authors":"M. Mia, Anis Fitri Nur Masruriyah, Adi Rizky Pratama","doi":"10.38101/sisfotek.v12i2.551","DOIUrl":null,"url":null,"abstract":"The data on heart disease patients obtained from the Ministry of Health of the Republic of Indonesia in 2020 explains that heart disease has increased every year and ranks as the highest cause of death in Indonesia, especially at productive ages. If people with heart disease are not treated properly, then in their effective period a patient can experience death more quickly. Thus, a predictive model that is able to help medical personnel solve health problems is built. This study employed the Random Forest and Decision Tree algorithm classification process by processing cardiac patient data to create a predictive model and based on the data obtained, showing that the data on heart disease was not balanced. Thus, to overcome the imbalance, an oversampling technique was carried out using ADASYN and SMOTE. This study proved that the performance of the ADASYN and SMOTE oversampling techniques on the C45 algorithm and the Random Forest Classifier had a significant effect on the prediction results. The usage of oversampling techniques to analyze data aims to handle unbalanced datasets, and the confusion matrix is used for testing Precision, Recall, and F1-SCORE, as well as Accuracy. Based on the results of research that has been carried out with the K-Fold 10 testing technique and oversampling technique, SMOTE + RF is one of the best oversampling techniques which has a greater Accuracy of 93.58% compared to Random Forest without SMOTE of 90.51% and SMOTE + ADASYN of 93.55%. The application of the SMOTE technique was proven to be able to overcome the problem of data imbalance and get better classification results than the application of the ADASYN technique.","PeriodicalId":378682,"journal":{"name":"JURNAL SISFOTEK GLOBAL","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Utilization of Decision Tree Algorithm In Order to Predict Heart Disease\",\"authors\":\"M. Mia, Anis Fitri Nur Masruriyah, Adi Rizky Pratama\",\"doi\":\"10.38101/sisfotek.v12i2.551\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The data on heart disease patients obtained from the Ministry of Health of the Republic of Indonesia in 2020 explains that heart disease has increased every year and ranks as the highest cause of death in Indonesia, especially at productive ages. If people with heart disease are not treated properly, then in their effective period a patient can experience death more quickly. Thus, a predictive model that is able to help medical personnel solve health problems is built. This study employed the Random Forest and Decision Tree algorithm classification process by processing cardiac patient data to create a predictive model and based on the data obtained, showing that the data on heart disease was not balanced. Thus, to overcome the imbalance, an oversampling technique was carried out using ADASYN and SMOTE. This study proved that the performance of the ADASYN and SMOTE oversampling techniques on the C45 algorithm and the Random Forest Classifier had a significant effect on the prediction results. The usage of oversampling techniques to analyze data aims to handle unbalanced datasets, and the confusion matrix is used for testing Precision, Recall, and F1-SCORE, as well as Accuracy. Based on the results of research that has been carried out with the K-Fold 10 testing technique and oversampling technique, SMOTE + RF is one of the best oversampling techniques which has a greater Accuracy of 93.58% compared to Random Forest without SMOTE of 90.51% and SMOTE + ADASYN of 93.55%. The application of the SMOTE technique was proven to be able to overcome the problem of data imbalance and get better classification results than the application of the ADASYN technique.\",\"PeriodicalId\":378682,\"journal\":{\"name\":\"JURNAL SISFOTEK GLOBAL\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JURNAL SISFOTEK GLOBAL\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.38101/sisfotek.v12i2.551\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JURNAL SISFOTEK GLOBAL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.38101/sisfotek.v12i2.551","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

2020年从印度尼西亚共和国卫生部获得的关于心脏病患者的数据解释说,心脏病每年都在增加,是印度尼西亚的最高死亡原因,特别是在生产年龄。如果心脏病患者没有得到适当的治疗,那么在他们的有效期内,病人可能会更快地死亡。从而建立一个能够帮助医务人员解决健康问题的预测模型。本研究采用随机森林和决策树算法分类过程,通过对心脏病患者数据进行处理,建立预测模型,并基于所获得的数据,表明心脏病数据不均衡。因此,为了克服这种不平衡,采用ADASYN和SMOTE进行过采样技术。本研究证明了ADASYN和SMOTE过采样技术对C45算法和随机森林分类器的性能对预测结果有显著影响。使用过采样技术分析数据的目的是处理不平衡的数据集,混淆矩阵用于测试Precision, Recall, F1-SCORE以及Accuracy。基于K-Fold 10测试技术和过采样技术的研究结果,SMOTE + RF是最好的过采样技术之一,其准确率为93.58%,而没有SMOTE的Random Forest和SMOTE + ADASYN的准确率分别为90.51%和93.55%。应用SMOTE技术可以克服数据不平衡的问题,得到比应用ADASYN技术更好的分类结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The Utilization of Decision Tree Algorithm In Order to Predict Heart Disease
The data on heart disease patients obtained from the Ministry of Health of the Republic of Indonesia in 2020 explains that heart disease has increased every year and ranks as the highest cause of death in Indonesia, especially at productive ages. If people with heart disease are not treated properly, then in their effective period a patient can experience death more quickly. Thus, a predictive model that is able to help medical personnel solve health problems is built. This study employed the Random Forest and Decision Tree algorithm classification process by processing cardiac patient data to create a predictive model and based on the data obtained, showing that the data on heart disease was not balanced. Thus, to overcome the imbalance, an oversampling technique was carried out using ADASYN and SMOTE. This study proved that the performance of the ADASYN and SMOTE oversampling techniques on the C45 algorithm and the Random Forest Classifier had a significant effect on the prediction results. The usage of oversampling techniques to analyze data aims to handle unbalanced datasets, and the confusion matrix is used for testing Precision, Recall, and F1-SCORE, as well as Accuracy. Based on the results of research that has been carried out with the K-Fold 10 testing technique and oversampling technique, SMOTE + RF is one of the best oversampling techniques which has a greater Accuracy of 93.58% compared to Random Forest without SMOTE of 90.51% and SMOTE + ADASYN of 93.55%. The application of the SMOTE technique was proven to be able to overcome the problem of data imbalance and get better classification results than the application of the ADASYN technique.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信