Development and validation of an interpretable machine learning model for predicting the risk of distant metastasis in papillary thyroid cancer: a multicenter study.

IF 9.6 1区 医学 Q1 MEDICINE, GENERAL & INTERNAL
EClinicalMedicine Pub Date : 2024-10-30 eCollection Date: 2024-11-01 DOI:10.1016/j.eclinm.2024.102913
Fei Hou, Yun Zhu, Hongbo Zhao, Haolin Cai, Yinghui Wang, Xiaoqi Peng, Lin Lu, Rongli He, Yan Hou, Zhenhui Li, Ting Chen
{"title":"Development and validation of an interpretable machine learning model for predicting the risk of distant metastasis in papillary thyroid cancer: a multicenter study.","authors":"Fei Hou, Yun Zhu, Hongbo Zhao, Haolin Cai, Yinghui Wang, Xiaoqi Peng, Lin Lu, Rongli He, Yan Hou, Zhenhui Li, Ting Chen","doi":"10.1016/j.eclinm.2024.102913","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The survival rate of patients with distant metastasis (DM) of papillary thyroid carcinoma (PTC) is significantly reduced. It is of great significance to find an effective method for early prediction of the risk of DM for formulating individualized diagnosis and treatment plans and improving prognosis. Previous studies have significant limitations, and it is still necessary to develop new models for predicting the risk of DM of PTC. We aimed to develop and validate interpretable machine learning (ML) models for early prediction of DM in patients with PTC using a multicenter cohort.</p><p><strong>Methods: </strong>We collected data on patients with PTC who were admitted between June 2013 and May 2023. Data from 1430 patients at Yunnan Cancer Hospital (YCH) served as the training and internal validation set, while data from 434 patients at the First Affiliated Hospital of Kunming Medical University (KMU 1st AH) was used as the external test set. Nine ML methods such as random forest (RF) were used to construct the model. Model prediction performance was compared using evaluation indicators such as the area under the receiver operating characteristic curve (AUC). The SHapley Additive exPlanation (SHAP) method was used to rank the feature importance and explain the final model.</p><p><strong>Findings: </strong>Among the nine ML models, the RF model performed the best. The RF model accurately predicted the risk of DM in patients with PTC in both the internal validation of the training set [AUC: 0.913, 95% confidence interval (CI) (0.9075-0.9185)] and the external test set [AUC: 0.8996, 95% CI (0.8483-0.9509)]. The calibration curve showed high agreement between the predicted and observed risks. In the sensitivity analysis focusing on DM sites of PTC, the RF model exhibited outstanding performance in predicting \"lung-only metastasis\" showing high AUC, specificity, sensitivity, F1 score, and a low Brier score. SHAP analysis identified variables that contributed to the model predictions. An online calculator based on the RF model was developed and made available for clinicians at https://predictingdistantmetastasis.shinyapps.io/shiny1/. 11 variables were included in the final RF model: age of the patient with PTC, whether the tumor size is > 2 cm, whether the tumor size is ≤ 1 cm, lymphocyte (LYM) count, monocyte (MONO) count, monocyte/lymphocyte ratio (MLR), thyroglobulin (TG) level, thyroid peroxidase antibody (TPOAb) level, whether the T stage is T1/2, whether the T stage is T3/4, and whether the N stage is N0.</p><p><strong>Interpretation: </strong>On the basis of large-sample and multicenter data, we developed and validated an explainable ML model for predicting the risk of DM in patients with PTC. The model helps clinicians to identify high-risk patients early and provides a basis for individualized patient treatment plans.</p><p><strong>Funding: </strong>This work was supported by the National Natural Science Foundation of China (No. 81960426, 82360345 and 82001986), the Outstanding Youth Science Foundation of Yunnan Basic Research Project (No. 202401AY070001-316), Yunnan Province Applied and Basic Research Foundation (No. 202401AT070008), and Ten Thousand Talent Plans for Young Top-notch Talents of Yunnan Province.</p>","PeriodicalId":11393,"journal":{"name":"EClinicalMedicine","volume":"77 ","pages":"102913"},"PeriodicalIF":9.6000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11567106/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EClinicalMedicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.eclinm.2024.102913","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The survival rate of patients with distant metastasis (DM) of papillary thyroid carcinoma (PTC) is significantly reduced. It is of great significance to find an effective method for early prediction of the risk of DM for formulating individualized diagnosis and treatment plans and improving prognosis. Previous studies have significant limitations, and it is still necessary to develop new models for predicting the risk of DM of PTC. We aimed to develop and validate interpretable machine learning (ML) models for early prediction of DM in patients with PTC using a multicenter cohort.

Methods: We collected data on patients with PTC who were admitted between June 2013 and May 2023. Data from 1430 patients at Yunnan Cancer Hospital (YCH) served as the training and internal validation set, while data from 434 patients at the First Affiliated Hospital of Kunming Medical University (KMU 1st AH) was used as the external test set. Nine ML methods such as random forest (RF) were used to construct the model. Model prediction performance was compared using evaluation indicators such as the area under the receiver operating characteristic curve (AUC). The SHapley Additive exPlanation (SHAP) method was used to rank the feature importance and explain the final model.

Findings: Among the nine ML models, the RF model performed the best. The RF model accurately predicted the risk of DM in patients with PTC in both the internal validation of the training set [AUC: 0.913, 95% confidence interval (CI) (0.9075-0.9185)] and the external test set [AUC: 0.8996, 95% CI (0.8483-0.9509)]. The calibration curve showed high agreement between the predicted and observed risks. In the sensitivity analysis focusing on DM sites of PTC, the RF model exhibited outstanding performance in predicting "lung-only metastasis" showing high AUC, specificity, sensitivity, F1 score, and a low Brier score. SHAP analysis identified variables that contributed to the model predictions. An online calculator based on the RF model was developed and made available for clinicians at https://predictingdistantmetastasis.shinyapps.io/shiny1/. 11 variables were included in the final RF model: age of the patient with PTC, whether the tumor size is > 2 cm, whether the tumor size is ≤ 1 cm, lymphocyte (LYM) count, monocyte (MONO) count, monocyte/lymphocyte ratio (MLR), thyroglobulin (TG) level, thyroid peroxidase antibody (TPOAb) level, whether the T stage is T1/2, whether the T stage is T3/4, and whether the N stage is N0.

Interpretation: On the basis of large-sample and multicenter data, we developed and validated an explainable ML model for predicting the risk of DM in patients with PTC. The model helps clinicians to identify high-risk patients early and provides a basis for individualized patient treatment plans.

Funding: This work was supported by the National Natural Science Foundation of China (No. 81960426, 82360345 and 82001986), the Outstanding Youth Science Foundation of Yunnan Basic Research Project (No. 202401AY070001-316), Yunnan Province Applied and Basic Research Foundation (No. 202401AT070008), and Ten Thousand Talent Plans for Young Top-notch Talents of Yunnan Province.

用于预测甲状腺乳头状癌远处转移风险的可解释机器学习模型的开发与验证:一项多中心研究。
背景:甲状腺乳头状癌(PTC)远处转移(DM)患者的生存率明显降低。找到一种早期预测甲状腺乳头状癌远处转移风险的有效方法,对于制定个体化诊疗方案、改善预后具有重要意义。以往的研究有很大的局限性,因此仍有必要开发新的模型来预测 PTC 的 DM 风险。我们旨在利用多中心队列开发和验证可解释的机器学习(ML)模型,用于早期预测PTC患者的DM:我们收集了 2013 年 6 月至 2023 年 5 月期间收治的 PTC 患者的数据。云南省肿瘤医院(YCH)1430 名患者的数据作为训练集和内部验证集,昆明医科大学第一附属医院(KMU First AH)434 名患者的数据作为外部测试集。模型的构建采用了随机森林(RF)等九种 ML 方法。使用接收者工作特征曲线下面积(AUC)等评价指标对模型预测性能进行比较。研究结果:在九个 ML 模型中,RF 模型表现最佳。在训练集的内部验证[AUC:0.913,95% 置信区间(CI)(0.9075-0.9185)]和外部测试集[AUC:0.8996,95% CI(0.8483-0.9509)]中,RF 模型都能准确预测 PTC 患者的 DM 风险。校准曲线显示,预测风险与观测风险之间的一致性很高。在以PTC的DM部位为重点的敏感性分析中,RF模型在预测 "肺转移 "方面表现突出,显示出较高的AUC、特异性、敏感性、F1得分和较低的Brier得分。SHAP分析确定了有助于模型预测的变量。基于 RF 模型开发了在线计算器,供临床医生使用,网址为 https://predictingdistantmetastasis.shinyapps.io/shiny1/。最终的 RF 模型包括 11 个变量:PTC 患者的年龄、肿瘤大小是否大于 2 厘米、肿瘤大小是否小于 1 厘米、淋巴细胞(LYM)计数、单核细胞(MONO)计数、单核细胞/淋巴细胞比值(MLR)、甲状腺球蛋白(TG)水平、甲状腺过氧化物酶抗体(TPOAb)水平、T 期是否为 T1/2、T 期是否为 T3/4、N 期是否为 N0:在大样本和多中心数据的基础上,我们开发并验证了一个可解释的 ML 模型,用于预测 PTC 患者的 DM 风险。该模型有助于临床医生早期识别高风险患者,并为患者的个体化治疗方案提供依据:本研究得到了国家自然科学基金(81960426、82360345、82001986)、云南省基础研究杰出青年科学基金项目(202401AY070001-316)、云南省应用基础研究基金(202401AT070008)、云南省青年拔尖人才万人计划的资助。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
EClinicalMedicine
EClinicalMedicine Medicine-Medicine (all)
CiteScore
18.90
自引率
1.30%
发文量
506
审稿时长
22 days
期刊介绍: eClinicalMedicine is a gold open-access clinical journal designed to support frontline health professionals in addressing the complex and rapid health transitions affecting societies globally. The journal aims to assist practitioners in overcoming healthcare challenges across diverse communities, spanning diagnosis, treatment, prevention, and health promotion. Integrating disciplines from various specialties and life stages, it seeks to enhance health systems as fundamental institutions within societies. With a forward-thinking approach, eClinicalMedicine aims to redefine the future of healthcare.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信