Meta-analysis of AI-based pulmonary embolism detection: How reliable are deep learning models?

IF 7 2区医学 Q1 BIOLOGY

Computers in biology and medicine Pub Date : 2025-05-23 DOI:10.1016/j.compbiomed.2025.110402

Ezio Lanza , Angela Ammirabile , Marco Francone

{"title":"Meta-analysis of AI-based pulmonary embolism detection: How reliable are deep learning models?","authors":"Ezio Lanza , Angela Ammirabile , Marco Francone","doi":"10.1016/j.compbiomed.2025.110402","DOIUrl":null,"url":null,"abstract":"<div><h3>Rationale and objectives</h3><div>Deep learning (DL)–based methods show promise in detecting pulmonary embolism (PE) on CT pulmonary angiography (CTPA), potentially improving diagnostic accuracy and workflow efficiency. This meta-analysis aimed to (1) determine pooled performance estimates of DL algorithms for PE detection; and (2) compare the diagnostic efficacy of convolutional neural network (CNN)– versus U-Net–based architectures.</div></div><div><h3>Materials and methods</h3><div>Following PRISMA guidelines, we searched PubMed and EMBASE through April 15, 2025 for English‐language studies (2010–2025) reporting DL models for PE detection with extractable 2 × 2 data or performance metrics. True/false positives and negatives were reconstructed when necessary under an assumed 50 % PE prevalence (with 0.5 continuity correction). We approximated AUROC as the mean of sensitivity and specificity if not directly reported. Sensitivity, specificity, accuracy, PPV and NPV were pooled using a DerSimonian–Laird random-effects model with Freeman-Tukey transformation; AUROC values were combined via a fixed-effect inverse-variance approach. Heterogeneity was assessed by Cochran's Q and I<sup>2</sup>. Subgroup analyses contrasted CNN versus U-Net models.</div></div><div><h3>Results</h3><div>Twenty-four studies (n = 22,984 patients) met inclusion criteria. Pooled estimates were: AUROC 0.895 (95 % CI: 0.874–0.917), sensitivity 0.894 (0.856–0.923), specificity 0.871 (0.831–0.903), accuracy 0.857 (0.833–0.882), PPV 0.832 (0.794–0.869) and NPV 0.902 (0.874–0.929). Between-study heterogeneity was high (I<sup>2</sup> ≈ 97 % for sensitivity/specificity). U-Net models exhibited higher sensitivity (0.899 vs 0.893) and CNN models higher specificity (0.926 vs 0.900); subgroup Q‐tests confirmed significant differences for both sensitivity (p = 0.0002) and specificity (p < 0.001).</div></div><div><h3>Conclusions</h3><div>DL algorithms demonstrate high diagnostic accuracy for PE detection on CTPA, with complementary strengths: U-Net architectures excel in true-positive identification, whereas CNNs yield fewer false positives. However, marked heterogeneity underscores the need for standardized, prospective validation before routine clinical implementation.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"193 ","pages":"Article 110402"},"PeriodicalIF":7.0000,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S001048252500753X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Rationale and objectives

Deep learning (DL)–based methods show promise in detecting pulmonary embolism (PE) on CT pulmonary angiography (CTPA), potentially improving diagnostic accuracy and workflow efficiency. This meta-analysis aimed to (1) determine pooled performance estimates of DL algorithms for PE detection; and (2) compare the diagnostic efficacy of convolutional neural network (CNN)– versus U-Net–based architectures.

Materials and methods

Following PRISMA guidelines, we searched PubMed and EMBASE through April 15, 2025 for English‐language studies (2010–2025) reporting DL models for PE detection with extractable 2 × 2 data or performance metrics. True/false positives and negatives were reconstructed when necessary under an assumed 50 % PE prevalence (with 0.5 continuity correction). We approximated AUROC as the mean of sensitivity and specificity if not directly reported. Sensitivity, specificity, accuracy, PPV and NPV were pooled using a DerSimonian–Laird random-effects model with Freeman-Tukey transformation; AUROC values were combined via a fixed-effect inverse-variance approach. Heterogeneity was assessed by Cochran's Q and I². Subgroup analyses contrasted CNN versus U-Net models.

Results

Twenty-four studies (n = 22,984 patients) met inclusion criteria. Pooled estimates were: AUROC 0.895 (95 % CI: 0.874–0.917), sensitivity 0.894 (0.856–0.923), specificity 0.871 (0.831–0.903), accuracy 0.857 (0.833–0.882), PPV 0.832 (0.794–0.869) and NPV 0.902 (0.874–0.929). Between-study heterogeneity was high (I² ≈ 97 % for sensitivity/specificity). U-Net models exhibited higher sensitivity (0.899 vs 0.893) and CNN models higher specificity (0.926 vs 0.900); subgroup Q‐tests confirmed significant differences for both sensitivity (p = 0.0002) and specificity (p < 0.001).

Conclusions

DL algorithms demonstrate high diagnostic accuracy for PE detection on CTPA, with complementary strengths: U-Net architectures excel in true-positive identification, whereas CNNs yield fewer false positives. However, marked heterogeneity underscores the need for standardized, prospective validation before routine clinical implementation.

查看原文本刊更多论文

基于人工智能的肺栓塞检测的荟萃分析：深度学习模型的可靠性如何？

基于深度学习（DL）的方法有望在CT肺血管造影（CTPA）上检测肺栓塞（PE），有可能提高诊断准确性和工作流程效率。本荟萃分析旨在(1)确定用于PE检测的DL算法的综合性能估计；(2)比较卷积神经网络（CNN）与基于u - net架构的诊断效果。材料和方法：根据PRISMA指南，我们检索了PubMed和EMBASE，检索到2025年4月15日为止的英语研究（2010-2025），报告了使用可提取的2 × 2数据或性能指标进行PE检测的DL模型。假设PE患病率为50%，必要时重建真/假阳性和阴性（0.5连续性校正）。如果没有直接报道，我们将AUROC近似为敏感性和特异性的平均值。灵敏度、特异性、准确性、PPV和NPV采用带有Freeman-Tukey转换的dersimonan - laird随机效应模型进行汇总；AUROC值通过固定效应反方差方法组合。异质性通过Cochran’s Q和I2进行评估。亚组分析对比了CNN和U-Net模型。结果24项研究（n = 22984例）符合纳入标准。合并估计AUROC为0.895 (95% CI: 0.874-0.917)，灵敏度为0.894(0.856-0.923)，特异性为0.871(0.831-0.903)，准确度为0.857 (0.833-0.882)，PPV为0.832 (0.794-0.869)，NPV为0.902（0.874-0.929）。研究间异质性高（敏感性/特异性I2≈97%）。U-Net模型灵敏度较高（0.899 vs 0.893）， CNN模型特异性较高（0.926 vs 0.900）；亚组Q‐检验证实敏感性（p = 0.0002）和特异性(p <；0.001)。结论sdl算法对CTPA上的PE检测具有较高的诊断准确性，并具有互补优势：U-Net架构在真阳性识别方面表现出色，而cnn产生的假阳性较少。然而，显著的异质性强调了在常规临床实施之前进行标准化、前瞻性验证的必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers in biology and medicine 工程技术-工程：生物医学

CiteScore

11.70

自引率

10.40%

发文量

1086

审稿时长

74 days

期刊介绍： Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.