Penerapan Data Mining dalam Analisis Prediksi Kanker Paru Menggunakan Algoritma Random Forest

Laura Sari, Annisa Romadloni, R. Listyaningrum
{"title":"Penerapan Data Mining dalam Analisis Prediksi Kanker Paru Menggunakan Algoritma Random Forest","authors":"Laura Sari, Annisa Romadloni, R. Listyaningrum","doi":"10.35970/infotekmesin.v14i1.1751","DOIUrl":null,"url":null,"abstract":"Cancer is the second highest cause of death in the world. In Indonesia, it is a disease with a high mortality rate. Most patients do not realize that they have lung cancer thus the treatment is sometimes too late. A prediction method with a high degree of accuracy is needed to detect lung cancer earlier. Previous research used data mining calcification methods with the Naïve Bayes algorithm to predict lung cancer. This research resulted in high recall values for the positive class (Yes class) but low for the negative class (No class). This research was made using the Random Forest algorithm which is known to have good performance. The modeling is optimized by applying the K-fold Cross Validation technique. The Random Forest algorithm produces a higher Accuracy value than the Naïve Bayes algorithm, which is 98.4%. This algorithm produces 100% Recall for the positive class, 80% for the negative class and provides a 100% correct prediction as can be seen from the AUC value of 1. Although a statistical test with a significance level of 5% shows the results of the two algorithms are not significantly different.","PeriodicalId":33598,"journal":{"name":"Infotekmesin Media Komunikasi Ilmiah Politeknik Cilacap","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infotekmesin Media Komunikasi Ilmiah Politeknik Cilacap","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35970/infotekmesin.v14i1.1751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Cancer is the second highest cause of death in the world. In Indonesia, it is a disease with a high mortality rate. Most patients do not realize that they have lung cancer thus the treatment is sometimes too late. A prediction method with a high degree of accuracy is needed to detect lung cancer earlier. Previous research used data mining calcification methods with the Naïve Bayes algorithm to predict lung cancer. This research resulted in high recall values for the positive class (Yes class) but low for the negative class (No class). This research was made using the Random Forest algorithm which is known to have good performance. The modeling is optimized by applying the K-fold Cross Validation technique. The Random Forest algorithm produces a higher Accuracy value than the Naïve Bayes algorithm, which is 98.4%. This algorithm produces 100% Recall for the positive class, 80% for the negative class and provides a 100% correct prediction as can be seen from the AUC value of 1. Although a statistical test with a significance level of 5% shows the results of the two algorithms are not significantly different.
使用随机森林算法在肺癌预测分析中的应用数据
癌症是世界上第二大死因。在印度尼西亚,这是一种死亡率很高的疾病。大多数患者没有意识到他们患有癌症,因此治疗有时为时已晚。早期发现癌症需要一种高准确度的预测方法。先前的研究使用数据挖掘钙化方法和Naive Bayes算法来预测癌症。这项研究导致积极类(是类)的回忆值较高,而消极类(否类)的回想值较低。这项研究是使用随机森林算法进行的,该算法已知具有良好的性能。通过应用K折叠交叉验证技术对建模进行了优化。随机森林算法产生的准确度值高于朴素贝叶斯算法,为98.4%。该算法对正类产生100%的回忆,对负类产生80%的回忆,并提供100%的正确预测,如AUC值1所示。尽管显著性水平为5%的统计测试表明,两种算法的结果没有显著差异。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
30
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信