Diagnostic Accuracy of Deep Learning for Predicting Lymph Node Metastasis Based on Computed Tomography in Gastric Cancer: A Systematic Review and Meta-analysis.
Armin Majd Gharamaleki, Arman Majd Gharamaleki, Alireza Amanollahi, Sarvin Tabibzadeh
{"title":"Diagnostic Accuracy of Deep Learning for Predicting Lymph Node Metastasis Based on Computed Tomography in Gastric Cancer: A Systematic Review and Meta-analysis.","authors":"Armin Majd Gharamaleki, Arman Majd Gharamaleki, Alireza Amanollahi, Sarvin Tabibzadeh","doi":"10.47176/mjiri.39.110","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Early detection of lymphatic metastasis (LNM) in gastric cancer (GC) is essential to determine the treatment strategy. Conventional methods exhibit limited efficacy, highlighting the need for more reliable approaches. Deep learning (DL) models show promise for LNM detection in computed tomography (CT); their performance requires comprehensive evaluation. This systematic review and meta-analysis evaluate the diagnostic performance of CT-based DL models for detecting LNM in GC patients.</p><p><strong>Methods: </strong>A systematic review and meta-analysis was conducted according to PRISMA-DTA guidelines. PubMed, Embase, and Web of Science were searched up to May 5, 2025. The focus was on studies that used DL models to detect LNM in CT in GC. Using a bivariate random effect model, Pooled estimates were calculated, heterogeneity and publication bias were assessed, and clinical utility was evaluated via Fagan plots and likelihood ratio matrices. Validation type, input data types, CT phases, segmentation techniques, and DL architectures stratified subgroup analyses. The quality was assessed with QUADAS-2.</p><p><strong>Results: </strong>From the 14 included studies, 11 studies with 5296 patients were analyzed. In internal validation, DL feature-based models achieved a pooled area under the curve (AUC) of 0.91 (95% CI: 0.88-0.93), sensitivity of 0.86 (95% CI: 0.75-0.92), and specificity of 0.83 (95% CI: 0.67-0.92). Performance degraded in external validation, with specificity dropping to 0.59 (95% CI: 0.26-0.85). Models that integrated DL features with radiomics features showed similar overall performance but were noted to have a higher confirmatory power. In terms of clinical utility, although the models could significantly alter post-test probabilities, they ultimately lacked the certainty required to serve as standalone diagnostic tools.</p><p><strong>Conclusion: </strong>CT-based DL models show high diagnostic accuracy but limited generalizability across external datasets, indicating overfitting. A key finding of this meta-analysis is that pervasive and asymmetric heterogeneity, particularly in specificity, suggests that technical standardization alone is insufficient. Integrating clinical variables reduces heterogeneity; however, prospective, multicenter studies are needed to further enhance reproducibility.</p>","PeriodicalId":18361,"journal":{"name":"Medical Journal of the Islamic Republic of Iran","volume":"39 ","pages":"110"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12516434/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Journal of the Islamic Republic of Iran","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47176/mjiri.39.110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Early detection of lymphatic metastasis (LNM) in gastric cancer (GC) is essential to determine the treatment strategy. Conventional methods exhibit limited efficacy, highlighting the need for more reliable approaches. Deep learning (DL) models show promise for LNM detection in computed tomography (CT); their performance requires comprehensive evaluation. This systematic review and meta-analysis evaluate the diagnostic performance of CT-based DL models for detecting LNM in GC patients.
Methods: A systematic review and meta-analysis was conducted according to PRISMA-DTA guidelines. PubMed, Embase, and Web of Science were searched up to May 5, 2025. The focus was on studies that used DL models to detect LNM in CT in GC. Using a bivariate random effect model, Pooled estimates were calculated, heterogeneity and publication bias were assessed, and clinical utility was evaluated via Fagan plots and likelihood ratio matrices. Validation type, input data types, CT phases, segmentation techniques, and DL architectures stratified subgroup analyses. The quality was assessed with QUADAS-2.
Results: From the 14 included studies, 11 studies with 5296 patients were analyzed. In internal validation, DL feature-based models achieved a pooled area under the curve (AUC) of 0.91 (95% CI: 0.88-0.93), sensitivity of 0.86 (95% CI: 0.75-0.92), and specificity of 0.83 (95% CI: 0.67-0.92). Performance degraded in external validation, with specificity dropping to 0.59 (95% CI: 0.26-0.85). Models that integrated DL features with radiomics features showed similar overall performance but were noted to have a higher confirmatory power. In terms of clinical utility, although the models could significantly alter post-test probabilities, they ultimately lacked the certainty required to serve as standalone diagnostic tools.
Conclusion: CT-based DL models show high diagnostic accuracy but limited generalizability across external datasets, indicating overfitting. A key finding of this meta-analysis is that pervasive and asymmetric heterogeneity, particularly in specificity, suggests that technical standardization alone is insufficient. Integrating clinical variables reduces heterogeneity; however, prospective, multicenter studies are needed to further enhance reproducibility.