Machine-Learning-Based Survival Prediction in Castration-Resistant Prostate Cancer: A Multi-Model Analysis Using a Comprehensive Clinical Dataset.

IF 3 3区医学 Q2 HEALTH CARE SCIENCES & SERVICES

Journal of Personalized Medicine Pub Date : 2025-09-08 DOI:10.3390/jpm15090432

Jeong Hyun Lee, Jaeyun Jeong, Young Jin Ahn, Kwang Suk Lee, Jong Soo Lee, Seung Hwan Lee, Won Sik Ham, Byung Ha Chung, Kyo Chul Koo

{"title":"Machine-Learning-Based Survival Prediction in Castration-Resistant Prostate Cancer: A Multi-Model Analysis Using a Comprehensive Clinical Dataset.","authors":"Jeong Hyun Lee, Jaeyun Jeong, Young Jin Ahn, Kwang Suk Lee, Jong Soo Lee, Seung Hwan Lee, Won Sik Ham, Byung Ha Chung, Kyo Chul Koo","doi":"10.3390/jpm15090432","DOIUrl":null,"url":null,"abstract":"Purpose: Accurate survival prediction is essential for optimizing the treatment planning in patients with castration-resistant prostate cancer (CRPC). However, the traditional statistical models often underperform due to limited variable inclusion and an inability to account for complex, multidimensional data interactions. Methods: We retrospectively collected 46 clinical, laboratory, and pathological variables from 801 patients with CRPC, covering the disease course from the initial disease diagnosis to CRPC progression. Multiple machine learning (ML) models, including random survival forests (RSFs), XGBoost, LightGBM, and logistic regression, were developed to predict cancer-specific mortality (CSM), overall mortality (OM), and 2- and 3-year survival status. The dataset was split into training and test cohorts (80:20), with 10-fold cross-validation. The performance was assessed using the C-index for regression models and the AUC, accuracy, precision, recall, and F1-score for classification models. Model interpretability was assessed using SHapley Additive exPlanations (SHAP). Results: Over a median follow-up of 24 months, 70.6% of patients experienced CSM. RSFs achieved the highest C-index in the test set for both CSM (0.772) and OM (0.771). For classification tasks, RSFs demonstrated a superior performance in predicting 2-year survival, while XGBoost yielded the highest F1-score for 3-year survival. The SHAP analysis identified time to first-line CRPC treatment and hemoglobin and alkaline phosphatase levels as key predictors of survival outcomes. Conclusion: The RSF and XGBoost ML models demonstrated a superior performance over that of traditional statistical methods in predicting survival in CRPC. These models offer accurate and interpretable prognostic tools that may inform personalized treatment strategies. External validation and the integration of emerging therapies are warranted for broader clinical applicability.","PeriodicalId":16722,"journal":{"name":"Journal of Personalized Medicine","volume":"15 9","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12471436/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Personalized Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/jpm15090432","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Accurate survival prediction is essential for optimizing the treatment planning in patients with castration-resistant prostate cancer (CRPC). However, the traditional statistical models often underperform due to limited variable inclusion and an inability to account for complex, multidimensional data interactions. Methods: We retrospectively collected 46 clinical, laboratory, and pathological variables from 801 patients with CRPC, covering the disease course from the initial disease diagnosis to CRPC progression. Multiple machine learning (ML) models, including random survival forests (RSFs), XGBoost, LightGBM, and logistic regression, were developed to predict cancer-specific mortality (CSM), overall mortality (OM), and 2- and 3-year survival status. The dataset was split into training and test cohorts (80:20), with 10-fold cross-validation. The performance was assessed using the C-index for regression models and the AUC, accuracy, precision, recall, and F1-score for classification models. Model interpretability was assessed using SHapley Additive exPlanations (SHAP). Results: Over a median follow-up of 24 months, 70.6% of patients experienced CSM. RSFs achieved the highest C-index in the test set for both CSM (0.772) and OM (0.771). For classification tasks, RSFs demonstrated a superior performance in predicting 2-year survival, while XGBoost yielded the highest F1-score for 3-year survival. The SHAP analysis identified time to first-line CRPC treatment and hemoglobin and alkaline phosphatase levels as key predictors of survival outcomes. Conclusion: The RSF and XGBoost ML models demonstrated a superior performance over that of traditional statistical methods in predicting survival in CRPC. These models offer accurate and interpretable prognostic tools that may inform personalized treatment strategies. External validation and the integration of emerging therapies are warranted for broader clinical applicability.

Abstract Image

查看原文本刊更多论文

基于机器学习的去势抵抗性前列腺癌生存预测：使用综合临床数据集的多模型分析。

目的：准确的生存预测对于优化去势抵抗性前列腺癌（CRPC）患者的治疗方案至关重要。然而，由于有限的变量包含和无法解释复杂的多维数据交互，传统的统计模型往往表现不佳。方法：回顾性收集801例CRPC患者的46项临床、实验室和病理指标，涵盖了从最初的疾病诊断到CRPC进展的病程。多种机器学习（ML）模型，包括随机生存森林（RSFs）、XGBoost、LightGBM和逻辑回归，用于预测癌症特异性死亡率（CSM）、总死亡率（OM）以及2年和3年的生存状态。数据集被分成训练组和测试组（80:20），进行10倍交叉验证。使用回归模型的c指数和分类模型的AUC、准确度、精密度、召回率和f1评分来评估性能。采用SHapley加性解释（SHAP）评价模型可解释性。结果：中位随访24个月，70.6%的患者出现CSM。rsf在CSM（0.772）和OM（0.771）的测试集中均获得了最高的c指数。对于分类任务，RSFs在预测2年生存率方面表现优异，而XGBoost在预测3年生存率方面获得了最高的f1评分。SHAP分析确定一线CRPC治疗时间、血红蛋白和碱性磷酸酶水平是生存结果的关键预测因素。结论：RSF和XGBoost ML模型在预测CRPC患者生存方面优于传统统计学方法。这些模型提供了准确和可解释的预后工具，可以为个性化治疗策略提供信息。为了更广泛的临床适用性，外部验证和新兴疗法的整合是必要的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Personalized Medicine Medicine-Medicine (miscellaneous)

CiteScore

4.10

自引率

0.00%

发文量

1878

审稿时长

11 weeks

期刊介绍： Journal of Personalized Medicine (JPM; ISSN 2075-4426) is an international, open access journal aimed at bringing all aspects of personalized medicine to one platform. JPM publishes cutting edge, innovative preclinical and translational scientific research and technologies related to personalized medicine (e.g., pharmacogenomics/proteomics, systems biology). JPM recognizes that personalized medicine—the assessment of genetic, environmental and host factors that cause variability of individuals—is a challenging, transdisciplinary topic that requires discussions from a range of experts. For a comprehensive perspective of personalized medicine, JPM aims to integrate expertise from the molecular and translational sciences, therapeutics and diagnostics, as well as discussions of regulatory, social, ethical and policy aspects. We provide a forum to bring together academic and clinical researchers, biotechnology, diagnostic and pharmaceutical companies, health professionals, regulatory and ethical experts, and government and regulatory authorities.