Improving the Accuracy of Oncology Diagnosis: A Machine Learning-Based Approach to Cancer Prediction

International Journal of Online and Biomedical Engineering (iJOE) Pub Date : 2024-08-08 DOI:10.3991/ijoe.v20i11.49139

M. Cabanillas-Carbonell, Joselyn Zapata-Paulini

{"title":"Improving the Accuracy of Oncology Diagnosis: A Machine Learning-Based Approach to Cancer Prediction","authors":"M. Cabanillas-Carbonell, Joselyn Zapata-Paulini","doi":"10.3991/ijoe.v20i11.49139","DOIUrl":null,"url":null,"abstract":"Cancer ranks among the most lethal illnesses worldwide, and predicting its onset can be a crucial factor in enhancing people’s quality of life by taking preventive measures to improve treatment and survival. This study conducted comparative research to determine the machine learning model with the highest accuracy for tumor type classification, distinguishing between malignant (cancer) and benign tumors. The models evaluated include decision tree (DT), naive bayes (NB), extra trees classifier (ETM), random forest (RF), K-means clustering (K-means), logistic regression (LR), adaptive boosting (AdaBoost), gradient boosting (GB), light gradient boosting machine (LightGBM), and extreme gradient boosting (XGBoost) to identify the one with the best accuracy. The models were trained using a dataset of 569 records and a total of 32 variables, containing patient information and tumor characteristics. The study was structured into sections, such as related studies, descriptions of the models, case study development, results, discussion, and conclusions. The models’ performance was evaluated based on metrics of precision, sensitivity, accuracy, and F1 score. Following the training, the results positioned the XGBoost model as having the best performance, achieving 98% precision, accuracy, sensitivity, and F1 score.","PeriodicalId":507997,"journal":{"name":"International Journal of Online and Biomedical Engineering (iJOE)","volume":"75 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Online and Biomedical Engineering (iJOE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3991/ijoe.v20i11.49139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Cancer ranks among the most lethal illnesses worldwide, and predicting its onset can be a crucial factor in enhancing people’s quality of life by taking preventive measures to improve treatment and survival. This study conducted comparative research to determine the machine learning model with the highest accuracy for tumor type classification, distinguishing between malignant (cancer) and benign tumors. The models evaluated include decision tree (DT), naive bayes (NB), extra trees classifier (ETM), random forest (RF), K-means clustering (K-means), logistic regression (LR), adaptive boosting (AdaBoost), gradient boosting (GB), light gradient boosting machine (LightGBM), and extreme gradient boosting (XGBoost) to identify the one with the best accuracy. The models were trained using a dataset of 569 records and a total of 32 variables, containing patient information and tumor characteristics. The study was structured into sections, such as related studies, descriptions of the models, case study development, results, discussion, and conclusions. The models’ performance was evaluated based on metrics of precision, sensitivity, accuracy, and F1 score. Following the training, the results positioned the XGBoost model as having the best performance, achieving 98% precision, accuracy, sensitivity, and F1 score.

查看原文本刊更多论文

提高肿瘤诊断的准确性：基于机器学习的癌症预测方法

癌症是全球致死率最高的疾病之一，而预测癌症的发病可以通过采取预防措施提高治疗和生存率，从而成为提高人们生活质量的关键因素。本研究进行了比较研究，以确定在肿瘤类型分类（区分恶性肿瘤（癌症）和良性肿瘤）方面准确率最高的机器学习模型。评估的模型包括决策树（DT）、奈夫贝叶斯（NB）、额外树分类器（ETM）、随机森林（RF）、K-means 聚类（K-means）、逻辑回归（LR）、自适应提升（AdaBoost）、梯度提升（GB）、轻梯度提升机（LightGBM）和极端梯度提升（XGBoost），以找出准确率最高的模型。模型的训练使用了一个包含 569 条记录和总共 32 个变量的数据集，其中包含患者信息和肿瘤特征。本研究分为相关研究、模型描述、案例研究开发、结果、讨论和结论等部分。根据精确度、灵敏度、准确度和 F1 分数等指标对模型的性能进行了评估。训练结束后，结果表明 XGBoost 模型性能最佳，精确度、准确度、灵敏度和 F1 分数均达到 98%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Online and Biomedical Engineering (iJOE)

自引率

0.00%

发文量