Construction and validation of a machine learning model to predict the risk of nasopharyngeal carcinoma using multimodal clinical data: a single-center, retrospective study.

IF 2.5 3区 医学 Q2 ONCOLOGY
Xiao Li, Zuheng Wang, Wenting Chen, Chunmeng Wei, Wenhao Lu, Rongbin Zhou, Fubo Wang, Leifeng Liang
{"title":"Construction and validation of a machine learning model to predict the risk of nasopharyngeal carcinoma using multimodal clinical data: a single-center, retrospective study.","authors":"Xiao Li, Zuheng Wang, Wenting Chen, Chunmeng Wei, Wenhao Lu, Rongbin Zhou, Fubo Wang, Leifeng Liang","doi":"10.1007/s12094-025-03992-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Early detection and treatment of nasopharyngeal carcinoma (NPC) are critical for improving patient prognosis. The aim of this study is to develop and compare multiple machine learning (ML) models using multimodal clinical data to identify a predictive model for NPC risk, increase diagnostic accuracy, and guide personalized treatment strategies.</p><p><strong>Methods: </strong>Clinical data were retrospectively collected from 1337 patients suspected of having NPC at the First People's Hospital of Yulin. Feature selection was performed using the least absolute shrinkage and selection operator (LASSO) regression. Patients were divided into training and test sets (80:20 ratio), and seven ML models were developed based on the training set. Model performance was assessed using metrics such as the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. The best-performing model was further evaluated through decision curve analysis (DCA), calibration, and learning curves. SHapley Additive exPlanations (SHAP) were used to interpret key clinical features.</p><p><strong>Results: </strong>Seven models were developed using 17 clinical features selected from 53 parameters. The gradient boosting decision tree (GBDT) model demonstrated superior performance (AUC of 0.95 in the training cohort and 0.82 in the validation cohort). Calibration curves and DCA confirmed the model's strong accuracy and clinical benefit. SHAP analysis revealed that age, lymphocyte percentage, serum albumin, sex, and EBV IgM were the five most significant predictors of NPC risk.</p><p><strong>Conclusion: </strong>The GBDT-based ML model, using multimodal clinical data, accurately identifies patients at high risk for NPC, providing a valuable tool for early screening and personalized treatment strategies.</p>","PeriodicalId":50685,"journal":{"name":"Clinical & Translational Oncology","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical & Translational Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s12094-025-03992-0","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: Early detection and treatment of nasopharyngeal carcinoma (NPC) are critical for improving patient prognosis. The aim of this study is to develop and compare multiple machine learning (ML) models using multimodal clinical data to identify a predictive model for NPC risk, increase diagnostic accuracy, and guide personalized treatment strategies.

Methods: Clinical data were retrospectively collected from 1337 patients suspected of having NPC at the First People's Hospital of Yulin. Feature selection was performed using the least absolute shrinkage and selection operator (LASSO) regression. Patients were divided into training and test sets (80:20 ratio), and seven ML models were developed based on the training set. Model performance was assessed using metrics such as the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. The best-performing model was further evaluated through decision curve analysis (DCA), calibration, and learning curves. SHapley Additive exPlanations (SHAP) were used to interpret key clinical features.

Results: Seven models were developed using 17 clinical features selected from 53 parameters. The gradient boosting decision tree (GBDT) model demonstrated superior performance (AUC of 0.95 in the training cohort and 0.82 in the validation cohort). Calibration curves and DCA confirmed the model's strong accuracy and clinical benefit. SHAP analysis revealed that age, lymphocyte percentage, serum albumin, sex, and EBV IgM were the five most significant predictors of NPC risk.

Conclusion: The GBDT-based ML model, using multimodal clinical data, accurately identifies patients at high risk for NPC, providing a valuable tool for early screening and personalized treatment strategies.

使用多模式临床数据预测鼻咽癌风险的机器学习模型的构建和验证:一项单中心、回顾性研究。
目的:鼻咽癌的早期发现和治疗是改善鼻咽癌患者预后的关键。本研究的目的是利用多模态临床数据开发和比较多个机器学习(ML)模型,以确定鼻咽癌风险的预测模型,提高诊断准确性,并指导个性化治疗策略。方法:回顾性收集榆林市第一人民医院1337例疑似鼻咽癌患者的临床资料。使用最小绝对收缩和选择算子(LASSO)回归进行特征选择。将患者按80:20的比例分为训练集和测试集,在训练集的基础上建立7个ML模型。模型性能评估使用指标,如受者工作特征曲线下面积(AUC),灵敏度和特异性。通过决策曲线分析(DCA)、校准和学习曲线进一步评估最佳模型。采用SHapley加性解释(SHAP)解释主要临床特征。结果:从53个参数中选择17个临床特征,建立了7个模型。梯度增强决策树(GBDT)模型在训练组和验证组的AUC分别为0.95和0.82。校正曲线和DCA验证了该模型较强的准确性和临床效益。SHAP分析显示,年龄、淋巴细胞百分比、血清白蛋白、性别和EBV IgM是鼻咽癌风险的五个最重要的预测因素。结论:基于gbdt的ML模型利用多模态临床数据,准确识别鼻咽癌高危患者,为早期筛查和个性化治疗策略提供了有价值的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.20
自引率
2.90%
发文量
240
审稿时长
1 months
期刊介绍: Clinical and Translational Oncology is an international journal devoted to fostering interaction between experimental and clinical oncology. It covers all aspects of research on cancer, from the more basic discoveries dealing with both cell and molecular biology of tumour cells, to the most advanced clinical assays of conventional and new drugs. In addition, the journal has a strong commitment to facilitating the transfer of knowledge from the basic laboratory to the clinical practice, with the publication of educational series devoted to closing the gap between molecular and clinical oncologists. Molecular biology of tumours, identification of new targets for cancer therapy, and new technologies for research and treatment of cancer are the major themes covered by the educational series. Full research articles on a broad spectrum of subjects, including the molecular and cellular bases of disease, aetiology, pathophysiology, pathology, epidemiology, clinical features, and the diagnosis, prognosis and treatment of cancer, will be considered for publication.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信