Wenhao Lu, Lin Zhao, Shenfan Wang, Huiyong Zhang, Kangxian Jiang, Jin Ji, Shaohua Chen, Chengbang Wang, Chunmeng Wei, Rongbin Zhou, Zuheng Wang, Xiao Li, Fubo Wang, Xuedong Wei, Wenlei Hou
{"title":"Explainable and visualizable machine learning models to predict biochemical recurrence of prostate cancer","authors":"Wenhao Lu, Lin Zhao, Shenfan Wang, Huiyong Zhang, Kangxian Jiang, Jin Ji, Shaohua Chen, Chengbang Wang, Chunmeng Wei, Rongbin Zhou, Zuheng Wang, Xiao Li, Fubo Wang, Xuedong Wei, Wenlei Hou","doi":"10.1007/s12094-024-03480-x","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Purpose</h3><p>Machine learning (ML) models presented an excellent performance in the prognosis prediction. However, the black box characteristic of ML models limited the clinical applications. Here, we aimed to establish explainable and visualizable ML models to predict biochemical recurrence (BCR) of prostate cancer (PCa).</p><h3 data-test=\"abstract-sub-heading\">Materials and methods</h3><p>A total of 647 PCa patients were retrospectively evaluated. Clinical parameters were identified using LASSO regression. Then, cohort was split into training and validation datasets with a ratio of 0.75:0.25 and BCR-related features were included in Cox regression and five ML algorithm to construct BCR prediction models. The clinical utility of each model was evaluated by concordance index (C-index) values and decision curve analyses (DCA). Besides, Shapley Additive Explanation (SHAP) values were used to explain the features in the models.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>We identified 11 BCR-related features using LASSO regression, then establishing five ML-based models, including random survival forest (RSF), survival support vector machine (SSVM), survival Tree (sTree), gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), and a Cox regression model, C-index were 0.846 (95%CI 0.796–0.894), 0.774 (95%CI 0.712–0.834), 0.757 (95%CI 0.694–0.818), 0.820 (95%CI 0.765–0.869), 0.793 (95%CI 0.735–0.852), and 0.807 (95%CI 0.753–0.858), respectively. The DCA showed that RSF model had significant advantages over all models. In interpretability of ML models, the SHAP value demonstrated the tangible contribution of each feature in RSF model.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>Our score system provide reference for the identification for BCR, and the crafting of a framework for making therapeutic decisions for PCa on a personalized basis.</p>","PeriodicalId":10166,"journal":{"name":"Clinical and Translational Oncology","volume":"43 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical and Translational Oncology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12094-024-03480-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
Machine learning (ML) models presented an excellent performance in the prognosis prediction. However, the black box characteristic of ML models limited the clinical applications. Here, we aimed to establish explainable and visualizable ML models to predict biochemical recurrence (BCR) of prostate cancer (PCa).
Materials and methods
A total of 647 PCa patients were retrospectively evaluated. Clinical parameters were identified using LASSO regression. Then, cohort was split into training and validation datasets with a ratio of 0.75:0.25 and BCR-related features were included in Cox regression and five ML algorithm to construct BCR prediction models. The clinical utility of each model was evaluated by concordance index (C-index) values and decision curve analyses (DCA). Besides, Shapley Additive Explanation (SHAP) values were used to explain the features in the models.
Results
We identified 11 BCR-related features using LASSO regression, then establishing five ML-based models, including random survival forest (RSF), survival support vector machine (SSVM), survival Tree (sTree), gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), and a Cox regression model, C-index were 0.846 (95%CI 0.796–0.894), 0.774 (95%CI 0.712–0.834), 0.757 (95%CI 0.694–0.818), 0.820 (95%CI 0.765–0.869), 0.793 (95%CI 0.735–0.852), and 0.807 (95%CI 0.753–0.858), respectively. The DCA showed that RSF model had significant advantages over all models. In interpretability of ML models, the SHAP value demonstrated the tangible contribution of each feature in RSF model.
Conclusions
Our score system provide reference for the identification for BCR, and the crafting of a framework for making therapeutic decisions for PCa on a personalized basis.