OPTIMIZATION OF LUNG CANCER CLASSIFICATION METHOD USING EDA-BASED MACHINE LEARNING

Windania Purba, Sumita Wardani, Diana Febrina Lumbantoruan, Fransiska Celia Ivoi Silalahi, Thomas Leo Edison
{"title":"OPTIMIZATION OF LUNG CANCER CLASSIFICATION METHOD USING EDA-BASED MACHINE LEARNING","authors":"Windania Purba, Sumita Wardani, Diana Febrina Lumbantoruan, Fransiska Celia Ivoi Silalahi, Thomas Leo Edison","doi":"10.34012/jurnalsisteminformasidanilmukomputer.v6i2.3413","DOIUrl":null,"url":null,"abstract":"Lung cancer is one of the three deadliest diseases in the world and has rapidly developed. Based on this, researchers conducted research to predict the factors that influence lung cancer. One method to identify this is using data mining methods and classification techniques. Researchers used several popular algorithms in classification to make comparisons of the most accurate algorithms for lung cancer classification. The algorithms used include K-Nearest Neighbor, Random Forest Classifier, Logistic Regression, Linear SVM, Naïve Bayes, Decision Tree, Random Forest, Gradient Boosting, Kernel SVM, and MLPClassifier. The researcher used this algorithm because, in the research that the researcher found on the Kaggle platform, he examined the comparison of the algorithm using the breast cancer dataset. In previous studies, their researchers used SVM, which obtained an accuracy of 96.47%, Neural Networks of 97.06%, and Naïve Bayes with an accuracy of 91.18% to study breast cancer. The difference from previous research is that this study uses several existing algorithms in Machine Learning such as K-Nearest Neighbor, Random Forest Classifier, Logistic Regression, Linear SVM, Naïve Bayes, Decision Tree, Random Forest, Gradient Boosting, Kernel SVM, and MLPClassifier. In addition, this research was conducted to see whether the results of the accuracy of the algorithm that the researchers carried out using the lung cancer dataset had different results. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood.","PeriodicalId":499639,"journal":{"name":"Jusikom : Jurnal Sistem Informasi Ilmu Komputer","volume":"164 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jusikom : Jurnal Sistem Informasi Ilmu Komputer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34012/jurnalsisteminformasidanilmukomputer.v6i2.3413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Lung cancer is one of the three deadliest diseases in the world and has rapidly developed. Based on this, researchers conducted research to predict the factors that influence lung cancer. One method to identify this is using data mining methods and classification techniques. Researchers used several popular algorithms in classification to make comparisons of the most accurate algorithms for lung cancer classification. The algorithms used include K-Nearest Neighbor, Random Forest Classifier, Logistic Regression, Linear SVM, Naïve Bayes, Decision Tree, Random Forest, Gradient Boosting, Kernel SVM, and MLPClassifier. The researcher used this algorithm because, in the research that the researcher found on the Kaggle platform, he examined the comparison of the algorithm using the breast cancer dataset. In previous studies, their researchers used SVM, which obtained an accuracy of 96.47%, Neural Networks of 97.06%, and Naïve Bayes with an accuracy of 91.18% to study breast cancer. The difference from previous research is that this study uses several existing algorithms in Machine Learning such as K-Nearest Neighbor, Random Forest Classifier, Logistic Regression, Linear SVM, Naïve Bayes, Decision Tree, Random Forest, Gradient Boosting, Kernel SVM, and MLPClassifier. In addition, this research was conducted to see whether the results of the accuracy of the algorithm that the researchers carried out using the lung cancer dataset had different results. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood.
基于数据的机器学习优化肺癌分类方法
肺癌是世界上最致命的三大疾病之一,发展迅速。在此基础上,研究人员进行了预测肺癌影响因素的研究。识别这一点的一种方法是使用数据挖掘方法和分类技术。研究人员使用了几种流行的分类算法来比较最准确的肺癌分类算法。使用的算法包括k近邻、随机森林分类器、逻辑回归、线性支持向量机、Naïve贝叶斯、决策树、随机森林、梯度增强、核支持向量机和MLPClassifier。研究人员使用这个算法是因为,在研究人员在Kaggle平台上发现的研究中,他使用乳腺癌数据集检查了算法的比较。在之前的研究中,他们的研究人员分别使用准确率为96.47%的SVM、准确率为97.06%的Neural Networks和准确率为91.18%的Naïve Bayes来研究乳腺癌。与以往研究的不同之处在于,本研究使用了机器学习中现有的几种算法,如K-Nearest Neighbor、Random Forest Classifier、Logistic Regression、Linear SVM、Naïve Bayes、Decision Tree、Random Forest、Gradient Boosting、Kernel SVM和MLPClassifier。此外,进行这项研究是为了看看研究人员使用肺癌数据集进行的算法的准确性结果是否有不同的结果。本研究结果发现,准确率更高的算法是Random Forest和Gradient Boosting,准确率值为100%,而在以往的研究中,准确率值是相同的。尽管如此,梯度增强比随机森林具有更高的精度值。然后,根据本研究中使用的数据,预测肺癌诊断的最大影响因素是肥胖和咳血。本研究结果发现,准确率更高的算法是Random Forest和Gradient Boosting,准确率值为100%,而在以往的研究中,准确率值是相同的。尽管如此,梯度增强比随机森林具有更高的精度值。然后,根据本研究中使用的数据,预测肺癌诊断的最大影响因素是肥胖和咳血。本研究结果发现,准确率更高的算法是Random Forest和Gradient Boosting,准确率值为100%,而在以往的研究中,准确率值是相同的。尽管如此,梯度增强比随机森林具有更高的精度值。然后,根据本研究中使用的数据,预测肺癌诊断的最大影响因素是肥胖和咳血。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信