Implementation of Hyperparameters to the Ensemble Learning Method for Lung Cancer Classification

Building of Informatics, Technology and Science (BITS) Pub Date : 2023-09-30 DOI:10.47065/bits.v5i2.4096

Ridlo Yanuar, Siti Sa’adah, Prasti Eko Yunanto

{"title":"Implementation of Hyperparameters to the Ensemble Learning Method for Lung Cancer Classification","authors":"Ridlo Yanuar, Siti Sa’adah, Prasti Eko Yunanto","doi":"10.47065/bits.v5i2.4096","DOIUrl":null,"url":null,"abstract":"Lung cancer is the most common cause of death in someone who has cancer. This happens because of remembering the importance of lung function as a breathing apparatus and oxygen distribution throughout the body. Early identification of lung cancer is crucial to reduce its mortality rate. Accuracy is crucial since it indicates how accurately the model or system makes the right predictions. High levels of accuracy show that the model can produce trustworthy and accurate findings, essential for making effective decisions based on available data. In this research, ensemble learning approaches, namely bagging and boosting methods, were employed for classifying lung cancer. Hyperparameters, a class of parameters, are crucial to this model's effectiveness. In order to increase the lung cancer classification model's accuracy, a thorough investigation was conducted to identify the best hyperparameter combination. In this study, the dataset used is a medical dataset that contains a history of patients who have been diagnosed with lung cancer or not. The dataset is taken from Kaggle mysarahmadbhat and cancerdatahp from data world. To evaluate the model's accuracy, this study used the confusion matrix method which compares the model's prediction results with the ground truth. the study findings revealed that employing a dataset split ratio of 70:30 produced the best results, with the Random Forest, CatBoost, and XGBoost models achieving an impressive 98% accuracy, 0.98 precision, 0.98 recall, and 0.98 f1-score. but for AdaBoost, the best results were obtained on a dataset with a ratio of 80:20 with an accuracy of 96%, 0.97 precision, 0.96 recall, and 0.96 f1-score","PeriodicalId":474248,"journal":{"name":"Building of Informatics, Technology and Science (BITS)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Building of Informatics, Technology and Science (BITS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47065/bits.v5i2.4096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Lung cancer is the most common cause of death in someone who has cancer. This happens because of remembering the importance of lung function as a breathing apparatus and oxygen distribution throughout the body. Early identification of lung cancer is crucial to reduce its mortality rate. Accuracy is crucial since it indicates how accurately the model or system makes the right predictions. High levels of accuracy show that the model can produce trustworthy and accurate findings, essential for making effective decisions based on available data. In this research, ensemble learning approaches, namely bagging and boosting methods, were employed for classifying lung cancer. Hyperparameters, a class of parameters, are crucial to this model's effectiveness. In order to increase the lung cancer classification model's accuracy, a thorough investigation was conducted to identify the best hyperparameter combination. In this study, the dataset used is a medical dataset that contains a history of patients who have been diagnosed with lung cancer or not. The dataset is taken from Kaggle mysarahmadbhat and cancerdatahp from data world. To evaluate the model's accuracy, this study used the confusion matrix method which compares the model's prediction results with the ground truth. the study findings revealed that employing a dataset split ratio of 70:30 produced the best results, with the Random Forest, CatBoost, and XGBoost models achieving an impressive 98% accuracy, 0.98 precision, 0.98 recall, and 0.98 f1-score. but for AdaBoost, the best results were obtained on a dataset with a ratio of 80:20 with an accuracy of 96%, 0.97 precision, 0.96 recall, and 0.96 f1-score

查看原文本刊更多论文

肺癌分类集成学习方法的超参数实现

肺癌是癌症患者最常见的死因。这是因为记住肺作为呼吸器官和氧气在全身分布的重要性。早期发现肺癌对降低死亡率至关重要。准确性是至关重要的，因为它表明模型或系统做出正确预测的准确性。高度的准确性表明，该模型可以产生可信和准确的结果，这对于根据现有数据做出有效决策至关重要。本研究采用集合学习方法，即bagging和boosting方法对肺癌进行分类。超参数是影响模型有效性的关键参数。为了提高肺癌分类模型的准确性，我们进行了深入的研究，以确定最佳的超参数组合。在本研究中，使用的数据集是一个医学数据集，其中包含已诊断为肺癌或未诊断为肺癌的患者的病史。数据集来自Kaggle mysarahmadbhat, cancerdatahp来自数据世界。为了评估模型的准确性，本研究使用了混淆矩阵法，将模型的预测结果与地面真实情况进行比较。研究结果表明，采用70:30的数据集分割比产生了最好的结果，随机森林、CatBoost和XGBoost模型实现了令人印象深刻的98%的准确率、0.98的精度、0.98的召回率和0.98的f1-score。而对于AdaBoost，在准确率为96%、精密度为0.97、召回率为0.96、f1-score为0.96的80:20的数据集上获得了最好的结果

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Building of Informatics, Technology and Science (BITS)

自引率

0.00%

发文量