Machine learning models for prediction of lymph node metastasis in patients with gastric cancer: a Chinese single-centre study with external validation in an Asian American population.

IF 2.4 3区医学 Q1 MEDICINE, GENERAL & INTERNAL

BMJ Open Pub Date : 2025-03-25 DOI:10.1136/bmjopen-2024-098476

Qian Li, Shangcheng Yan, Weiran Yang, Zhuan Du, Ming Cheng, Renwei Chen, Qiankun Shao, Yuan Tian, Mengchao Sheng, Wei Peng, Yongyou Wu

{"title":"Machine learning models for prediction of lymph node metastasis in patients with gastric cancer: a Chinese single-centre study with external validation in an Asian American population.","authors":"Qian Li, Shangcheng Yan, Weiran Yang, Zhuan Du, Ming Cheng, Renwei Chen, Qiankun Shao, Yuan Tian, Mengchao Sheng, Wei Peng, Yongyou Wu","doi":"10.1136/bmjopen-2024-098476","DOIUrl":null,"url":null,"abstract":"Objective: To develop and validate machine learning (ML)-based models to predict lymph node metastasis (LNM) in patients with gastric cancer (GC).Design: Retrospective cohort study.Setting: Second Affiliated Hospital of Soochow University.Participants: A total of 500 inpatients from the Second Affiliated Hospital of Soochow University, collected retrospectively between 1 April 2018 and 31 March 2023, were used as the training set, while 824 Asian patients from the Surveillance, Epidemiology and End Results database comprised the external validation set.Main outcome measures: Prediction models were developed using multiple ML algorithms, including logistic regression, support vector machine, k-nearest neighbours, naive Bayes, decision tree (DT), gradient boosting DT, random forest and artificial neural network (ANN). The predictive value of these models was validated and evaluated through receiver operating characteristic curves, precision-recall (PR) curves, calibration curves, decision curve analysis and accuracy metrics.Results: Among the ML algorithms, the ANN outperformed others, achieving the highest accuracy (0.722; 95% CI: 0.692 to 0.751), precision (0.732; 95% CI: 0.694 to 0.776), F1 score (0.733; 95% CI: 0.695 to 0.773), specificity (0.728; 95% CI: 0.684 to 0.770) and area under the PR curve (0.781; 95% CI: 0.740 to 0.821) in the external validation results. Moreover, it demonstrated superior calibration and clinical utility. Shapley Additive Explanations analysis identified the depth of invasion, tumour size and Lauren classification as the most influential predictors of LNM in patients with GC. Furthermore, a user-friendly web application was developed to provide individual prediction results.Conclusions: This study introduces an accurate, reliable and clinically applicable approach for predicting the risk of LNM in patients with GC. The model demonstrates its potential to enhance the personalised management of GC in diverse populations, supported by external validation and an accessible web application for practical use.","PeriodicalId":9158,"journal":{"name":"BMJ Open","volume":"15 3","pages":"e098476"},"PeriodicalIF":2.4000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11938237/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Open","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bmjopen-2024-098476","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: To develop and validate machine learning (ML)-based models to predict lymph node metastasis (LNM) in patients with gastric cancer (GC).

Design: Retrospective cohort study.

Setting: Second Affiliated Hospital of Soochow University.

Participants: A total of 500 inpatients from the Second Affiliated Hospital of Soochow University, collected retrospectively between 1 April 2018 and 31 March 2023, were used as the training set, while 824 Asian patients from the Surveillance, Epidemiology and End Results database comprised the external validation set.

Main outcome measures: Prediction models were developed using multiple ML algorithms, including logistic regression, support vector machine, k-nearest neighbours, naive Bayes, decision tree (DT), gradient boosting DT, random forest and artificial neural network (ANN). The predictive value of these models was validated and evaluated through receiver operating characteristic curves, precision-recall (PR) curves, calibration curves, decision curve analysis and accuracy metrics.

Results: Among the ML algorithms, the ANN outperformed others, achieving the highest accuracy (0.722; 95% CI: 0.692 to 0.751), precision (0.732; 95% CI: 0.694 to 0.776), F1 score (0.733; 95% CI: 0.695 to 0.773), specificity (0.728; 95% CI: 0.684 to 0.770) and area under the PR curve (0.781; 95% CI: 0.740 to 0.821) in the external validation results. Moreover, it demonstrated superior calibration and clinical utility. Shapley Additive Explanations analysis identified the depth of invasion, tumour size and Lauren classification as the most influential predictors of LNM in patients with GC. Furthermore, a user-friendly web application was developed to provide individual prediction results.

Conclusions: This study introduces an accurate, reliable and clinically applicable approach for predicting the risk of LNM in patients with GC. The model demonstrates its potential to enhance the personalised management of GC in diverse populations, supported by external validation and an accessible web application for practical use.

查看原文本刊更多论文

目的：开发并验证基于机器学习（ML）的胃癌患者淋巴结转移预测模型：开发并验证基于机器学习（ML）的模型，以预测胃癌（GC）患者的淋巴结转移（LNM）：设计：回顾性队列研究：地点：苏州大学附属第二医院：以2018年4月1日至2023年3月31日期间回顾性收集的苏州大学附属第二医院共500名住院患者作为训练集，以监测、流行病学和最终结果数据库中的824名亚洲患者作为外部验证集：使用多种ML算法开发预测模型，包括逻辑回归、支持向量机、k-近邻、天真贝叶斯、决策树（DT）、梯度提升DT、随机森林和人工神经网络（ANN）。这些模型的预测价值通过接收者操作特征曲线、精确度-召回（PR）曲线、校准曲线、决策曲线分析和准确度指标进行了验证和评估：在 ML 算法中，ANN 的表现优于其他算法，获得了最高的准确度（0.722；95% CI：0.692 至 0.751）、精确度（0.732；95% CI：0.694 至 0.776）、F1 分数（0.733；95% CI：0.695 至 0.773）、特异性（0.728；95% CI：0.684 至 0.770）和 PR 曲线下面积（0.781；95% CI：0.740 至 0.821）。此外，它还显示出卓越的校准性和临床实用性。Shapley Additive Explanations 分析发现，浸润深度、肿瘤大小和劳伦分类是对 GC 患者 LNM 最有影响的预测因素。此外，还开发了一个用户友好型网络应用程序，以提供个人预测结果：本研究介绍了一种准确、可靠且适用于临床的方法，用于预测 GC 患者发生 LNM 的风险。该模型通过外部验证和便于实际使用的网络应用程序，证明了其在加强不同人群 GC 个性化管理方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMJ Open MEDICINE, GENERAL & INTERNAL-

CiteScore

4.40

自引率

3.40%

发文量

4510

审稿时长

2-3 weeks

期刊介绍： BMJ Open is an online, open access journal, dedicated to publishing medical research from all disciplines and therapeutic areas. The journal publishes all research study types, from study protocols to phase I trials to meta-analyses, including small or specialist studies. Publishing procedures are built around fully open peer review and continuous publication, publishing research online as soon as the article is ready.