基于计算机断层扫描预测胃癌淋巴结转移的深度学习诊断准确性：一项系统综述和荟萃分析。

Q2 Medicine

Medical Journal of the Islamic Republic of Iran Pub Date : 2025-08-20 eCollection Date: 2025-01-01 DOI:10.47176/mjiri.39.110

Armin Majd Gharamaleki, Arman Majd Gharamaleki, Alireza Amanollahi, Sarvin Tabibzadeh

{"title":"基于计算机断层扫描预测胃癌淋巴结转移的深度学习诊断准确性：一项系统综述和荟萃分析。","authors":"Armin Majd Gharamaleki, Arman Majd Gharamaleki, Alireza Amanollahi, Sarvin Tabibzadeh","doi":"10.47176/mjiri.39.110","DOIUrl":null,"url":null,"abstract":"Background: Early detection of lymphatic metastasis (LNM) in gastric cancer (GC) is essential to determine the treatment strategy. Conventional methods exhibit limited efficacy, highlighting the need for more reliable approaches. Deep learning (DL) models show promise for LNM detection in computed tomography (CT); their performance requires comprehensive evaluation. This systematic review and meta-analysis evaluate the diagnostic performance of CT-based DL models for detecting LNM in GC patients.Methods: A systematic review and meta-analysis was conducted according to PRISMA-DTA guidelines. PubMed, Embase, and Web of Science were searched up to May 5, 2025. The focus was on studies that used DL models to detect LNM in CT in GC. Using a bivariate random effect model, Pooled estimates were calculated, heterogeneity and publication bias were assessed, and clinical utility was evaluated via Fagan plots and likelihood ratio matrices. Validation type, input data types, CT phases, segmentation techniques, and DL architectures stratified subgroup analyses. The quality was assessed with QUADAS-2.Results: From the 14 included studies, 11 studies with 5296 patients were analyzed. In internal validation, DL feature-based models achieved a pooled area under the curve (AUC) of 0.91 (95% CI: 0.88-0.93), sensitivity of 0.86 (95% CI: 0.75-0.92), and specificity of 0.83 (95% CI: 0.67-0.92). Performance degraded in external validation, with specificity dropping to 0.59 (95% CI: 0.26-0.85). Models that integrated DL features with radiomics features showed similar overall performance but were noted to have a higher confirmatory power. In terms of clinical utility, although the models could significantly alter post-test probabilities, they ultimately lacked the certainty required to serve as standalone diagnostic tools.Conclusion: CT-based DL models show high diagnostic accuracy but limited generalizability across external datasets, indicating overfitting. A key finding of this meta-analysis is that pervasive and asymmetric heterogeneity, particularly in specificity, suggests that technical standardization alone is insufficient. Integrating clinical variables reduces heterogeneity; however, prospective, multicenter studies are needed to further enhance reproducibility.","PeriodicalId":18361,"journal":{"name":"Medical Journal of the Islamic Republic of Iran","volume":"39 ","pages":"110"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12516434/pdf/","citationCount":"0","resultStr":"{\"title\":\"Diagnostic Accuracy of Deep Learning for Predicting Lymph Node Metastasis Based on Computed Tomography in Gastric Cancer: A Systematic Review and Meta-analysis.\",\"authors\":\"Armin Majd Gharamaleki, Arman Majd Gharamaleki, Alireza Amanollahi, Sarvin Tabibzadeh\",\"doi\":\"10.47176/mjiri.39.110\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Early detection of lymphatic metastasis (LNM) in gastric cancer (GC) is essential to determine the treatment strategy. Conventional methods exhibit limited efficacy, highlighting the need for more reliable approaches. Deep learning (DL) models show promise for LNM detection in computed tomography (CT); their performance requires comprehensive evaluation. This systematic review and meta-analysis evaluate the diagnostic performance of CT-based DL models for detecting LNM in GC patients.Methods: A systematic review and meta-analysis was conducted according to PRISMA-DTA guidelines. PubMed, Embase, and Web of Science were searched up to May 5, 2025. The focus was on studies that used DL models to detect LNM in CT in GC. Using a bivariate random effect model, Pooled estimates were calculated, heterogeneity and publication bias were assessed, and clinical utility was evaluated via Fagan plots and likelihood ratio matrices. Validation type, input data types, CT phases, segmentation techniques, and DL architectures stratified subgroup analyses. The quality was assessed with QUADAS-2.Results: From the 14 included studies, 11 studies with 5296 patients were analyzed. In internal validation, DL feature-based models achieved a pooled area under the curve (AUC) of 0.91 (95% CI: 0.88-0.93), sensitivity of 0.86 (95% CI: 0.75-0.92), and specificity of 0.83 (95% CI: 0.67-0.92). Performance degraded in external validation, with specificity dropping to 0.59 (95% CI: 0.26-0.85). Models that integrated DL features with radiomics features showed similar overall performance but were noted to have a higher confirmatory power. In terms of clinical utility, although the models could significantly alter post-test probabilities, they ultimately lacked the certainty required to serve as standalone diagnostic tools.Conclusion: CT-based DL models show high diagnostic accuracy but limited generalizability across external datasets, indicating overfitting. A key finding of this meta-analysis is that pervasive and asymmetric heterogeneity, particularly in specificity, suggests that technical standardization alone is insufficient. Integrating clinical variables reduces heterogeneity; however, prospective, multicenter studies are needed to further enhance reproducibility.\",\"PeriodicalId\":18361,\"journal\":{\"name\":\"Medical Journal of the Islamic Republic of Iran\",\"volume\":\"39 \",\"pages\":\"110\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12516434/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical Journal of the Islamic Republic of Iran\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.47176/mjiri.39.110\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Journal of the Islamic Republic of Iran","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47176/mjiri.39.110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

摘要

背景：早期发现胃癌（GC）的淋巴转移（LNM）对于确定治疗策略至关重要。传统方法的效果有限，因此需要更可靠的方法。深度学习（DL）模型有望在计算机断层扫描（CT）中进行LNM检测；他们的表现需要综合评价。本系统综述和荟萃分析评估了基于ct的DL模型在GC患者中检测LNM的诊断性能。方法：根据PRISMA-DTA指南进行系统评价和荟萃分析。PubMed， Embase和Web of Science被搜索到2025年5月5日。重点是使用DL模型检测GC中CT中的LNM的研究。使用双变量随机效应模型，计算Pooled估计，评估异质性和发表偏倚，并通过Fagan图和似然比矩阵评估临床效用。验证类型、输入数据类型、CT阶段、分割技术和DL架构分层子组分析。采用QUADAS-2评价质量。结果：在纳入的14项研究中，分析了11项研究，共5296例患者。在内部验证中，基于DL特征的模型的曲线下汇总面积（AUC）为0.91 (95% CI: 0.88-0.93)，灵敏度为0.86 (95% CI: 0.75-0.92)，特异性为0.83 （95% CI: 0.67-0.92）。在外部验证中，性能下降，特异性降至0.59 （95% CI: 0.26-0.85）。将DL特征与放射组学特征相结合的模型显示出相似的总体性能，但具有更高的验证能力。在临床应用方面，尽管这些模型可以显著改变测试后的概率，但它们最终缺乏作为独立诊断工具所需的确定性。结论：基于ct的DL模型具有较高的诊断准确性，但在外部数据集上的泛化性有限，表明过拟合。这项荟萃分析的一个关键发现是，普遍和不对称的异质性，特别是在特异性方面，表明仅靠技术标准化是不够的。整合临床变量减少异质性；然而，需要前瞻性的多中心研究来进一步提高可重复性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Diagnostic Accuracy of Deep Learning for Predicting Lymph Node Metastasis Based on Computed Tomography in Gastric Cancer: A Systematic Review and Meta-analysis.

查看原文本刊更多论文

Diagnostic Accuracy of Deep Learning for Predicting Lymph Node Metastasis Based on Computed Tomography in Gastric Cancer: A Systematic Review and Meta-analysis.

Background: Early detection of lymphatic metastasis (LNM) in gastric cancer (GC) is essential to determine the treatment strategy. Conventional methods exhibit limited efficacy, highlighting the need for more reliable approaches. Deep learning (DL) models show promise for LNM detection in computed tomography (CT); their performance requires comprehensive evaluation. This systematic review and meta-analysis evaluate the diagnostic performance of CT-based DL models for detecting LNM in GC patients.

Methods: A systematic review and meta-analysis was conducted according to PRISMA-DTA guidelines. PubMed, Embase, and Web of Science were searched up to May 5, 2025. The focus was on studies that used DL models to detect LNM in CT in GC. Using a bivariate random effect model, Pooled estimates were calculated, heterogeneity and publication bias were assessed, and clinical utility was evaluated via Fagan plots and likelihood ratio matrices. Validation type, input data types, CT phases, segmentation techniques, and DL architectures stratified subgroup analyses. The quality was assessed with QUADAS-2.

Results: From the 14 included studies, 11 studies with 5296 patients were analyzed. In internal validation, DL feature-based models achieved a pooled area under the curve (AUC) of 0.91 (95% CI: 0.88-0.93), sensitivity of 0.86 (95% CI: 0.75-0.92), and specificity of 0.83 (95% CI: 0.67-0.92). Performance degraded in external validation, with specificity dropping to 0.59 (95% CI: 0.26-0.85). Models that integrated DL features with radiomics features showed similar overall performance but were noted to have a higher confirmatory power. In terms of clinical utility, although the models could significantly alter post-test probabilities, they ultimately lacked the certainty required to serve as standalone diagnostic tools.

Conclusion: CT-based DL models show high diagnostic accuracy but limited generalizability across external datasets, indicating overfitting. A key finding of this meta-analysis is that pervasive and asymmetric heterogeneity, particularly in specificity, suggests that technical standardization alone is insufficient. Integrating clinical variables reduces heterogeneity; however, prospective, multicenter studies are needed to further enhance reproducibility.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Medical Journal of the Islamic Republic of Iran Medicine-Medicine (all)

CiteScore

2.40

自引率

0.00%

发文量

审稿时长

8 weeks