Individual risk and prognostic value prediction by interpretable machine learning for distant metastasis in neuroblastoma: A population-based study and an external validation

IF 3.7 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Shan Li , Jinkui Wang , Zhaoxia Zhang , Chunnian Ren , Dawei He
{"title":"Individual risk and prognostic value prediction by interpretable machine learning for distant metastasis in neuroblastoma: A population-based study and an external validation","authors":"Shan Li ,&nbsp;Jinkui Wang ,&nbsp;Zhaoxia Zhang ,&nbsp;Chunnian Ren ,&nbsp;Dawei He","doi":"10.1016/j.ijmedinf.2025.105813","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>Neuroblastoma (NB) is a childhood malignancy with a poor prognosis and a propensity for distant metastasis (DM). We aimed to establish machine learning (ML) based model to accurately predict risk of DM and prognosis of NB patients with DM.</div></div><div><h3>Methods</h3><div>We analyzed NB patients from the Surveillance, Epidemiology, and End Results (SEER) database between 2000 and 2020. Univariate and multivariate logistic analysis were employed to select meaning variables. Recursive Feature Elimination (RFE) method based on 6 ML algorithms was utilized in feature selection. To construct predictive model, 13 ML algorithms were evaluated by area under the operating characteristic curve (AUC), accuracy, sensitivity, specificity, precision, cross-entropy, Brier scores, Balanced Accuracy and F-beta score. An optimal ML model was constructed to predict DM, and the predictive results were explained by SHapley Additive exPlanations (SHAP) framework. Meanwhile, 101 ML algorithm combinations were developed to select the best model with highest C-index to predict prognosis of NB patients with DM.</div></div><div><h3>Results</h3><div>A total of 1,668 NB patients from SEER database was consecutively enrolled. We identified that tumor primary site, grade, surgery type, regional lymph nodes, radiotherapy and chemotherapy are significant risk factors for DM. CatBoost model was selected as the best prediction model, and AUC was 0.846 (95 %CI: [0.804,0.899]), 0.834 (95 %CI: [0.796,0.873]) and 0.813 (95 %CI: [0.776,0.852]) in training, internal test and external test sets, with 0.777 accuracy, 0.839 sensitivity, 0.72 specificity and 0.731 precision in training set. Grade, chemotherapy and radiotherapy had the greatest effects on DM according to SHAP results. For prognosis prediction, “RSF + GBM” algorithm was the best prognostic model with C-index of 0.656, 0.611 and 0.629 in training, internal test and external test sets.</div></div><div><h3>Conclusions</h3><div>Our ML models demonstrate excellent accuracy and reliability, offering more precise personalized metastasis diagnosis and prognostic prediction to NB patients.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"196 ","pages":"Article 105813"},"PeriodicalIF":3.7000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625000309","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

Neuroblastoma (NB) is a childhood malignancy with a poor prognosis and a propensity for distant metastasis (DM). We aimed to establish machine learning (ML) based model to accurately predict risk of DM and prognosis of NB patients with DM.

Methods

We analyzed NB patients from the Surveillance, Epidemiology, and End Results (SEER) database between 2000 and 2020. Univariate and multivariate logistic analysis were employed to select meaning variables. Recursive Feature Elimination (RFE) method based on 6 ML algorithms was utilized in feature selection. To construct predictive model, 13 ML algorithms were evaluated by area under the operating characteristic curve (AUC), accuracy, sensitivity, specificity, precision, cross-entropy, Brier scores, Balanced Accuracy and F-beta score. An optimal ML model was constructed to predict DM, and the predictive results were explained by SHapley Additive exPlanations (SHAP) framework. Meanwhile, 101 ML algorithm combinations were developed to select the best model with highest C-index to predict prognosis of NB patients with DM.

Results

A total of 1,668 NB patients from SEER database was consecutively enrolled. We identified that tumor primary site, grade, surgery type, regional lymph nodes, radiotherapy and chemotherapy are significant risk factors for DM. CatBoost model was selected as the best prediction model, and AUC was 0.846 (95 %CI: [0.804,0.899]), 0.834 (95 %CI: [0.796,0.873]) and 0.813 (95 %CI: [0.776,0.852]) in training, internal test and external test sets, with 0.777 accuracy, 0.839 sensitivity, 0.72 specificity and 0.731 precision in training set. Grade, chemotherapy and radiotherapy had the greatest effects on DM according to SHAP results. For prognosis prediction, “RSF + GBM” algorithm was the best prognostic model with C-index of 0.656, 0.611 and 0.629 in training, internal test and external test sets.

Conclusions

Our ML models demonstrate excellent accuracy and reliability, offering more precise personalized metastasis diagnosis and prognostic prediction to NB patients.
通过可解释机器学习预测神经母细胞瘤远处转移的个体风险和预后价值:一项基于人群的研究和一项外部验证。
目的:神经母细胞瘤(NB)是一种儿童恶性肿瘤,预后差,易发生远处转移(DM)。我们的目的是建立基于机器学习(ML)的模型来准确预测NB患者DM的风险和预后。方法:我们分析2000年至2020年监测、流行病学和最终结果(SEER)数据库中的NB患者。采用单因素和多因素逻辑分析选择意义变量。采用基于6 ML算法的递归特征消除(RFE)方法进行特征选择。为构建预测模型,采用工作特征曲线下面积(area under operating characteristic curve, AUC)、准确度、灵敏度、特异性、精密度、交叉熵、Brier评分、Balanced accuracy和F-beta评分对13种ML算法进行评价。构建了预测DM的最优ML模型,并用SHapley加性解释(SHAP)框架对预测结果进行解释。同时,开发101 ML算法组合,选择c指数最高的最佳模型预测NB合并dm患者的预后。结果:从SEER数据库中共入组1668例NB患者。我们发现肿瘤原发部位、肿瘤分级、手术类型、局部淋巴结、放化疗是DM的重要危险因素,选择CatBoost模型作为最佳预测模型,在训练集、内部测试集和外部测试集的AUC分别为0.846 (95% CI:[0.804,0.899])、0.834 (95% CI:[0.796,0.873])和0.813 (95% CI:[0.776,0.852]),训练集的准确度为0.777,灵敏度为0.839,特异性为0.72,精度为0.731。根据SHAP结果,分级、化疗和放疗对DM的影响最大。对于预后预测,“RSF + GBM”算法在训练集、内部测试集和外部测试集的c指数分别为0.656、0.611和0.629,是最佳的预后模型。结论:我们的ML模型具有良好的准确性和可靠性,可为NB患者提供更精确的个性化转移诊断和预后预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Medical Informatics
International Journal of Medical Informatics 医学-计算机:信息系统
CiteScore
8.90
自引率
4.10%
发文量
217
审稿时长
42 days
期刊介绍: International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信