Huali Zhou, Qiong Gu, Rong Bao, Liping Qiu, Yuhan Zhang, Fang Wang, Wenlian Liu, Lingling Wu, Li Li, Yihua Ren, Lei Qiu, Qian Wang, Gaomin Zhang, Xiaoqing Qiao, Wenjie Yuan, Juan Ren, Min Luo, Rong Huang, Qing Yang
{"title":"Machine learning based models for predicting presentation delay risk among gastric cancer patients.","authors":"Huali Zhou, Qiong Gu, Rong Bao, Liping Qiu, Yuhan Zhang, Fang Wang, Wenlian Liu, Lingling Wu, Li Li, Yihua Ren, Lei Qiu, Qian Wang, Gaomin Zhang, Xiaoqing Qiao, Wenjie Yuan, Juan Ren, Min Luo, Rong Huang, Qing Yang","doi":"10.3389/fonc.2024.1503047","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Presentation delay of cancer patients prevents the patient from timely diagnosis and treatment leading to poor prognosis. Predicting the risk of presentation delay is crucial to improve the treatment outcomes. This study aimed to develop and validate prediction models of presentation delay risk in gastric cancer patients by using various machine learning models.</p><p><strong>Methods: </strong>875 cases of gastric cancer patients admitted to a tertiary oncology hospital from July 2023 to June 2024 were used as derivation cohort, 200 cases of gastric cancer patients admitted to other 4 tertiary hospital were used as external validation cohort. After collecting the data, statistical analysis was performed to identify discriminative variables for the prediction of presentation delay and 13 statistically significant variables are selected to develop machine learning models. The derivation cohort was randomly assigned to the training and internal validation set by the ratio of 7:3. Prediction models were developed based on six machine learning algorithms, which are logistic regression (LR), support vector machine (SVM), random forest (RF), gradient boosted trees (GBDT), extremely gradient boosting (XGBoost) and muti-layer perceptron (MLP). The discrimination and calibration of each model were assessed based on various metrics including accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F1-Score and area under curve (AUC), calibration curves and Brier scores. The best model was selected based on comparing of various metrics. Based on the selected best model, the impact of features to the prediction result was analyzed with the permutation feature importance method.</p><p><strong>Results: </strong>The incidence of presentation delay for gastric cancer patients was 39.3%. The developed models achieved performance metrics as AUC (0.893-0.925), accuracy (0.817-0.847), sensitivity (0.857-0.905), specificity (0.783-0.854), PPV (0.728-0.798), NPV (0.897-0.927), F1 score (0.791-0.826) and Brier score (0.107-0.138) in internal validation set, which indicated good discrimination and calibration for the prediction of presentation delay in gastric cancer patients. Among all models, RF based model was selected as the best one as it achieved good discrimination and calibration performance on both of internal and external validation set. Feature ranking results indicated that both of subjective and objective factors have significant impact on the occurrence of presentation delay in gastric cancer patients.</p><p><strong>Conclusion: </strong>This study demonstrated that the RF based model has favorable performance for the prediction of presentation delay in gastric cancer patients. It can help medical staffs to screen out high-risk gastric cancer patients for presentation delay, and to take appropriate and specific interventions to reduce the risk of presentation delay.</p>","PeriodicalId":12482,"journal":{"name":"Frontiers in Oncology","volume":"14 ","pages":"1503047"},"PeriodicalIF":3.5000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11769801/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fonc.2024.1503047","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Presentation delay of cancer patients prevents the patient from timely diagnosis and treatment leading to poor prognosis. Predicting the risk of presentation delay is crucial to improve the treatment outcomes. This study aimed to develop and validate prediction models of presentation delay risk in gastric cancer patients by using various machine learning models.
Methods: 875 cases of gastric cancer patients admitted to a tertiary oncology hospital from July 2023 to June 2024 were used as derivation cohort, 200 cases of gastric cancer patients admitted to other 4 tertiary hospital were used as external validation cohort. After collecting the data, statistical analysis was performed to identify discriminative variables for the prediction of presentation delay and 13 statistically significant variables are selected to develop machine learning models. The derivation cohort was randomly assigned to the training and internal validation set by the ratio of 7:3. Prediction models were developed based on six machine learning algorithms, which are logistic regression (LR), support vector machine (SVM), random forest (RF), gradient boosted trees (GBDT), extremely gradient boosting (XGBoost) and muti-layer perceptron (MLP). The discrimination and calibration of each model were assessed based on various metrics including accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F1-Score and area under curve (AUC), calibration curves and Brier scores. The best model was selected based on comparing of various metrics. Based on the selected best model, the impact of features to the prediction result was analyzed with the permutation feature importance method.
Results: The incidence of presentation delay for gastric cancer patients was 39.3%. The developed models achieved performance metrics as AUC (0.893-0.925), accuracy (0.817-0.847), sensitivity (0.857-0.905), specificity (0.783-0.854), PPV (0.728-0.798), NPV (0.897-0.927), F1 score (0.791-0.826) and Brier score (0.107-0.138) in internal validation set, which indicated good discrimination and calibration for the prediction of presentation delay in gastric cancer patients. Among all models, RF based model was selected as the best one as it achieved good discrimination and calibration performance on both of internal and external validation set. Feature ranking results indicated that both of subjective and objective factors have significant impact on the occurrence of presentation delay in gastric cancer patients.
Conclusion: This study demonstrated that the RF based model has favorable performance for the prediction of presentation delay in gastric cancer patients. It can help medical staffs to screen out high-risk gastric cancer patients for presentation delay, and to take appropriate and specific interventions to reduce the risk of presentation delay.
期刊介绍:
Cancer Imaging and Diagnosis is dedicated to the publication of results from clinical and research studies applied to cancer diagnosis and treatment. The section aims to publish studies from the entire field of cancer imaging: results from routine use of clinical imaging in both radiology and nuclear medicine, results from clinical trials, experimental molecular imaging in humans and small animals, research on new contrast agents in CT, MRI, ultrasound, publication of new technical applications and processing algorithms to improve the standardization of quantitative imaging and image guided interventions for the diagnosis and treatment of cancer.