Machine learning for predicting the prognosis of patients with thymoma and thymic carcinoma.

IF 2.1 3区 医学 Q3 RESPIRATORY SYSTEM
Journal of thoracic disease Pub Date : 2025-02-28 Epub Date: 2025-02-20 DOI:10.21037/jtd-24-1263
Haijie Xu, Xirui Lin, Junhan Wu, Jianrong Chen, Jiaying Wu, Zheng Lin, Xiaoming Cai, Jiong Lin, Peishen Li, Chaoquan He, Zefeng Xie, Hansheng Wu
{"title":"Machine learning for predicting the prognosis of patients with thymoma and thymic carcinoma.","authors":"Haijie Xu, Xirui Lin, Junhan Wu, Jianrong Chen, Jiaying Wu, Zheng Lin, Xiaoming Cai, Jiong Lin, Peishen Li, Chaoquan He, Zefeng Xie, Hansheng Wu","doi":"10.21037/jtd-24-1263","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Thymoma and thymic carcinoma are the most common tumors of the anterior mediastinum. However, there are little research on applying machine learning (ML) approaches to the prognostic prediction of thymoma and thymic carcinoma. The study aims to develop predictive models utilizing ML techniques to accurately forecast the 5-year survival of patients with thymoma and thymic carcinoma.</p><p><strong>Methods: </strong>Patients with malignant thymic neoplasms were identified in the Surveillance, Epidemiology, and End Results (SEER) 17 database, and their demographic and clinicopathological characteristics were collected. ML classifiers, including elastic net regularized logistic regression, random forest (RF), non-linear support vector machine (SVM), extreme gradient boosting (XGBoost) machine, and categorical boosting (CatBoost) were trained. The hyper-parameter of the algorithms was optimized by a grid search with five repeats of 10-fold cross-validation. Ensemble models were built based on the three algorithms with the highest area under the receiver operator characteristic (ROC) curve (AUC) in the validation set. The best model among the single models and ensemble model was selected as the final model. Calibration curve and decision curve were adopted to evaluate the calibration performance and clinical utility. For comparison, we constructed a baseline model consisting of age and Masaoka stages using logistic regression.</p><p><strong>Results: </strong>After data cleaning, 1,363 patients and 841 patients were included in the overall survival (OS) dataset and disease-specific survival (DSS) dataset, respectively. CatBoost [AUC: 0.755; 95% confidence interval (CI): 0.698-0.811] had the best performance in the OS prediction for the original dataset. The ensemble model achieved the highest prognostic efficiency for the original dataset, with an AUC of 0.833 (95% CI: 0.765-0.901). Calibration showed favorable goodness of fit and was further verified with the Hosmer-Lemeshow test (CatBoost: χ<sup>2</sup>=12.63, P=0.13; ensemble model: χ<sup>2</sup>=7.61, P=0.47). The decision curve showed that the final model provided a high net benefit. The model could significantly distinguish the prognosis of patients (all P values <0.001). Finally, World Health Organization (WHO) histological classification, Masaoka stage, and age were the variables that significantly contributed to the models' prediction of OS and DSS.</p><p><strong>Conclusions: </strong>We trained ML-based predictive models that could accurately predict the 5-year OS and DSS of patients with thymoma and thymic carcinoma.</p>","PeriodicalId":17542,"journal":{"name":"Journal of thoracic disease","volume":"17 2","pages":"824-835"},"PeriodicalIF":2.1000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11898343/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of thoracic disease","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/jtd-24-1263","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/20 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RESPIRATORY SYSTEM","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Thymoma and thymic carcinoma are the most common tumors of the anterior mediastinum. However, there are little research on applying machine learning (ML) approaches to the prognostic prediction of thymoma and thymic carcinoma. The study aims to develop predictive models utilizing ML techniques to accurately forecast the 5-year survival of patients with thymoma and thymic carcinoma.

Methods: Patients with malignant thymic neoplasms were identified in the Surveillance, Epidemiology, and End Results (SEER) 17 database, and their demographic and clinicopathological characteristics were collected. ML classifiers, including elastic net regularized logistic regression, random forest (RF), non-linear support vector machine (SVM), extreme gradient boosting (XGBoost) machine, and categorical boosting (CatBoost) were trained. The hyper-parameter of the algorithms was optimized by a grid search with five repeats of 10-fold cross-validation. Ensemble models were built based on the three algorithms with the highest area under the receiver operator characteristic (ROC) curve (AUC) in the validation set. The best model among the single models and ensemble model was selected as the final model. Calibration curve and decision curve were adopted to evaluate the calibration performance and clinical utility. For comparison, we constructed a baseline model consisting of age and Masaoka stages using logistic regression.

Results: After data cleaning, 1,363 patients and 841 patients were included in the overall survival (OS) dataset and disease-specific survival (DSS) dataset, respectively. CatBoost [AUC: 0.755; 95% confidence interval (CI): 0.698-0.811] had the best performance in the OS prediction for the original dataset. The ensemble model achieved the highest prognostic efficiency for the original dataset, with an AUC of 0.833 (95% CI: 0.765-0.901). Calibration showed favorable goodness of fit and was further verified with the Hosmer-Lemeshow test (CatBoost: χ2=12.63, P=0.13; ensemble model: χ2=7.61, P=0.47). The decision curve showed that the final model provided a high net benefit. The model could significantly distinguish the prognosis of patients (all P values <0.001). Finally, World Health Organization (WHO) histological classification, Masaoka stage, and age were the variables that significantly contributed to the models' prediction of OS and DSS.

Conclusions: We trained ML-based predictive models that could accurately predict the 5-year OS and DSS of patients with thymoma and thymic carcinoma.

机器学习用于预测胸腺瘤和胸腺癌患者的预后。
背景:胸腺瘤和胸腺癌是前纵隔最常见的肿瘤。然而,将机器学习(ML)方法应用于胸腺瘤和胸腺癌的预后预测方面的研究很少。本研究旨在利用ML技术建立预测模型,准确预测胸腺瘤和胸腺癌患者的5年生存率。方法:在监测、流行病学和最终结果(SEER) 17数据库中发现恶性胸腺肿瘤患者,收集其人口学和临床病理特征。机器学习分类器包括弹性网络正则化逻辑回归、随机森林(RF)、非线性支持向量机(SVM)、极端梯度增强(XGBoost)机和分类增强(CatBoost)。通过5次重复10倍交叉验证的网格搜索优化算法的超参数。基于验证集中receiver operator characteristic (ROC) curve (AUC)下面积最大的三种算法建立集成模型。在单一模型和集成模型中选取最优模型作为最终模型。采用标定曲线和决策曲线评价标定性能和临床应用价值。为了比较,我们使用逻辑回归构建了一个由年龄和Masaoka阶段组成的基线模型。结果:数据清理后,1,363例患者和841例患者分别被纳入总生存(OS)数据集和疾病特异性生存(DSS)数据集。CatBoost [AUC: 0.755;95%置信区间(CI): 0.698-0.811]在原始数据集的OS预测中表现最佳。集成模型对原始数据集的预测效率最高,AUC为0.833 (95% CI: 0.765-0.901)。校正显示良好的拟合优度,并用Hosmer-Lemeshow检验进一步验证(CatBoost: χ2=12.63, P=0.13;集成模型:χ2=7.61, P=0.47)。决策曲线表明,最终模型具有较高的净效益。结论:我们训练的基于ml的预测模型能够准确预测胸腺瘤和胸腺癌患者的5年OS和DSS。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of thoracic disease
Journal of thoracic disease RESPIRATORY SYSTEM-
CiteScore
4.60
自引率
4.00%
发文量
254
期刊介绍: The Journal of Thoracic Disease (JTD, J Thorac Dis, pISSN: 2072-1439; eISSN: 2077-6624) was founded in Dec 2009, and indexed in PubMed in Dec 2011 and Science Citation Index SCI in Feb 2013. It is published quarterly (Dec 2009- Dec 2011), bimonthly (Jan 2012 - Dec 2013), monthly (Jan. 2014-) and openly distributed worldwide. JTD received its impact factor of 2.365 for the year 2016. JTD publishes manuscripts that describe new findings and provide current, practical information on the diagnosis and treatment of conditions related to thoracic disease. All the submission and reviewing are conducted electronically so that rapid review is assured.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信