基于人工智能的肺栓塞检测的荟萃分析:深度学习模型的可靠性如何?

IF 7 2区 医学 Q1 BIOLOGY
Ezio Lanza , Angela Ammirabile , Marco Francone
{"title":"基于人工智能的肺栓塞检测的荟萃分析:深度学习模型的可靠性如何?","authors":"Ezio Lanza ,&nbsp;Angela Ammirabile ,&nbsp;Marco Francone","doi":"10.1016/j.compbiomed.2025.110402","DOIUrl":null,"url":null,"abstract":"<div><h3>Rationale and objectives</h3><div>Deep learning (DL)–based methods show promise in detecting pulmonary embolism (PE) on CT pulmonary angiography (CTPA), potentially improving diagnostic accuracy and workflow efficiency. This meta-analysis aimed to (1) determine pooled performance estimates of DL algorithms for PE detection; and (2) compare the diagnostic efficacy of convolutional neural network (CNN)– versus U-Net–based architectures.</div></div><div><h3>Materials and methods</h3><div>Following PRISMA guidelines, we searched PubMed and EMBASE through April 15, 2025 for English‐language studies (2010–2025) reporting DL models for PE detection with extractable 2 × 2 data or performance metrics. True/false positives and negatives were reconstructed when necessary under an assumed 50 % PE prevalence (with 0.5 continuity correction). We approximated AUROC as the mean of sensitivity and specificity if not directly reported. Sensitivity, specificity, accuracy, PPV and NPV were pooled using a DerSimonian–Laird random-effects model with Freeman-Tukey transformation; AUROC values were combined via a fixed-effect inverse-variance approach. Heterogeneity was assessed by Cochran's Q and I<sup>2</sup>. Subgroup analyses contrasted CNN versus U-Net models.</div></div><div><h3>Results</h3><div>Twenty-four studies (n = 22,984 patients) met inclusion criteria. Pooled estimates were: AUROC 0.895 (95 % CI: 0.874–0.917), sensitivity 0.894 (0.856–0.923), specificity 0.871 (0.831–0.903), accuracy 0.857 (0.833–0.882), PPV 0.832 (0.794–0.869) and NPV 0.902 (0.874–0.929). Between-study heterogeneity was high (I<sup>2</sup> ≈ 97 % for sensitivity/specificity). U-Net models exhibited higher sensitivity (0.899 vs 0.893) and CNN models higher specificity (0.926 vs 0.900); subgroup Q‐tests confirmed significant differences for both sensitivity (p = 0.0002) and specificity (p &lt; 0.001).</div></div><div><h3>Conclusions</h3><div>DL algorithms demonstrate high diagnostic accuracy for PE detection on CTPA, with complementary strengths: U-Net architectures excel in true-positive identification, whereas CNNs yield fewer false positives. However, marked heterogeneity underscores the need for standardized, prospective validation before routine clinical implementation.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"193 ","pages":"Article 110402"},"PeriodicalIF":7.0000,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Meta-analysis of AI-based pulmonary embolism detection: How reliable are deep learning models?\",\"authors\":\"Ezio Lanza ,&nbsp;Angela Ammirabile ,&nbsp;Marco Francone\",\"doi\":\"10.1016/j.compbiomed.2025.110402\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Rationale and objectives</h3><div>Deep learning (DL)–based methods show promise in detecting pulmonary embolism (PE) on CT pulmonary angiography (CTPA), potentially improving diagnostic accuracy and workflow efficiency. This meta-analysis aimed to (1) determine pooled performance estimates of DL algorithms for PE detection; and (2) compare the diagnostic efficacy of convolutional neural network (CNN)– versus U-Net–based architectures.</div></div><div><h3>Materials and methods</h3><div>Following PRISMA guidelines, we searched PubMed and EMBASE through April 15, 2025 for English‐language studies (2010–2025) reporting DL models for PE detection with extractable 2 × 2 data or performance metrics. True/false positives and negatives were reconstructed when necessary under an assumed 50 % PE prevalence (with 0.5 continuity correction). We approximated AUROC as the mean of sensitivity and specificity if not directly reported. Sensitivity, specificity, accuracy, PPV and NPV were pooled using a DerSimonian–Laird random-effects model with Freeman-Tukey transformation; AUROC values were combined via a fixed-effect inverse-variance approach. Heterogeneity was assessed by Cochran's Q and I<sup>2</sup>. Subgroup analyses contrasted CNN versus U-Net models.</div></div><div><h3>Results</h3><div>Twenty-four studies (n = 22,984 patients) met inclusion criteria. Pooled estimates were: AUROC 0.895 (95 % CI: 0.874–0.917), sensitivity 0.894 (0.856–0.923), specificity 0.871 (0.831–0.903), accuracy 0.857 (0.833–0.882), PPV 0.832 (0.794–0.869) and NPV 0.902 (0.874–0.929). Between-study heterogeneity was high (I<sup>2</sup> ≈ 97 % for sensitivity/specificity). U-Net models exhibited higher sensitivity (0.899 vs 0.893) and CNN models higher specificity (0.926 vs 0.900); subgroup Q‐tests confirmed significant differences for both sensitivity (p = 0.0002) and specificity (p &lt; 0.001).</div></div><div><h3>Conclusions</h3><div>DL algorithms demonstrate high diagnostic accuracy for PE detection on CTPA, with complementary strengths: U-Net architectures excel in true-positive identification, whereas CNNs yield fewer false positives. However, marked heterogeneity underscores the need for standardized, prospective validation before routine clinical implementation.</div></div>\",\"PeriodicalId\":10578,\"journal\":{\"name\":\"Computers in biology and medicine\",\"volume\":\"193 \",\"pages\":\"Article 110402\"},\"PeriodicalIF\":7.0000,\"publicationDate\":\"2025-05-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in biology and medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S001048252500753X\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S001048252500753X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

基于深度学习(DL)的方法有望在CT肺血管造影(CTPA)上检测肺栓塞(PE),有可能提高诊断准确性和工作流程效率。本荟萃分析旨在(1)确定用于PE检测的DL算法的综合性能估计;(2)比较卷积神经网络(CNN)与基于u - net架构的诊断效果。材料和方法:根据PRISMA指南,我们检索了PubMed和EMBASE,检索到2025年4月15日为止的英语研究(2010-2025),报告了使用可提取的2 × 2数据或性能指标进行PE检测的DL模型。假设PE患病率为50%,必要时重建真/假阳性和阴性(0.5连续性校正)。如果没有直接报道,我们将AUROC近似为敏感性和特异性的平均值。灵敏度、特异性、准确性、PPV和NPV采用带有Freeman-Tukey转换的dersimonan - laird随机效应模型进行汇总;AUROC值通过固定效应反方差方法组合。异质性通过Cochran’s Q和I2进行评估。亚组分析对比了CNN和U-Net模型。结果24项研究(n = 22984例)符合纳入标准。合并估计AUROC为0.895 (95% CI: 0.874-0.917),灵敏度为0.894(0.856-0.923),特异性为0.871(0.831-0.903),准确度为0.857 (0.833-0.882),PPV为0.832 (0.794-0.869),NPV为0.902(0.874-0.929)。研究间异质性高(敏感性/特异性I2≈97%)。U-Net模型灵敏度较高(0.899 vs 0.893), CNN模型特异性较高(0.926 vs 0.900);亚组Q‐检验证实敏感性(p = 0.0002)和特异性(p <;0.001)。结论sdl算法对CTPA上的PE检测具有较高的诊断准确性,并具有互补优势:U-Net架构在真阳性识别方面表现出色,而cnn产生的假阳性较少。然而,显著的异质性强调了在常规临床实施之前进行标准化、前瞻性验证的必要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Meta-analysis of AI-based pulmonary embolism detection: How reliable are deep learning models?

Rationale and objectives

Deep learning (DL)–based methods show promise in detecting pulmonary embolism (PE) on CT pulmonary angiography (CTPA), potentially improving diagnostic accuracy and workflow efficiency. This meta-analysis aimed to (1) determine pooled performance estimates of DL algorithms for PE detection; and (2) compare the diagnostic efficacy of convolutional neural network (CNN)– versus U-Net–based architectures.

Materials and methods

Following PRISMA guidelines, we searched PubMed and EMBASE through April 15, 2025 for English‐language studies (2010–2025) reporting DL models for PE detection with extractable 2 × 2 data or performance metrics. True/false positives and negatives were reconstructed when necessary under an assumed 50 % PE prevalence (with 0.5 continuity correction). We approximated AUROC as the mean of sensitivity and specificity if not directly reported. Sensitivity, specificity, accuracy, PPV and NPV were pooled using a DerSimonian–Laird random-effects model with Freeman-Tukey transformation; AUROC values were combined via a fixed-effect inverse-variance approach. Heterogeneity was assessed by Cochran's Q and I2. Subgroup analyses contrasted CNN versus U-Net models.

Results

Twenty-four studies (n = 22,984 patients) met inclusion criteria. Pooled estimates were: AUROC 0.895 (95 % CI: 0.874–0.917), sensitivity 0.894 (0.856–0.923), specificity 0.871 (0.831–0.903), accuracy 0.857 (0.833–0.882), PPV 0.832 (0.794–0.869) and NPV 0.902 (0.874–0.929). Between-study heterogeneity was high (I2 ≈ 97 % for sensitivity/specificity). U-Net models exhibited higher sensitivity (0.899 vs 0.893) and CNN models higher specificity (0.926 vs 0.900); subgroup Q‐tests confirmed significant differences for both sensitivity (p = 0.0002) and specificity (p < 0.001).

Conclusions

DL algorithms demonstrate high diagnostic accuracy for PE detection on CTPA, with complementary strengths: U-Net architectures excel in true-positive identification, whereas CNNs yield fewer false positives. However, marked heterogeneity underscores the need for standardized, prospective validation before routine clinical implementation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers in biology and medicine
Computers in biology and medicine 工程技术-工程:生物医学
CiteScore
11.70
自引率
10.40%
发文量
1086
审稿时长
74 days
期刊介绍: Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信