A real-world evaluation of the diagnostic accuracy of radiologists using positive predictive values verified from deep learning and natural language processing chest algorithms deployed retrospectively.

IF 2.1

BJR open Pub Date : 2023-12-12 eCollection Date: 2024-01-01 DOI:10.1093/bjro/tzad009

Bahadar S Bhatia, John F Morlese, Sarah Yusuf, Yiting Xie, Bob Schallhorn, David Gruen

{"title":"A real-world evaluation of the diagnostic accuracy of radiologists using positive predictive values verified from deep learning and natural language processing chest algorithms deployed retrospectively.","authors":"Bahadar S Bhatia, John F Morlese, Sarah Yusuf, Yiting Xie, Bob Schallhorn, David Gruen","doi":"10.1093/bjro/tzad009","DOIUrl":null,"url":null,"abstract":"Objectives: This diagnostic study assessed the accuracy of radiologists retrospectively, using the deep learning and natural language processing chest algorithms implemented in Clinical Review version 3.2 for: pneumothorax, rib fractures in digital chest X-ray radiographs (CXR); aortic aneurysm, pulmonary nodules, emphysema, and pulmonary embolism in CT images.Methods: The study design was double-blind (artificial intelligence [AI] algorithms and humans), retrospective, non-interventional, and at a single NHS Trust. Adult patients (≥18 years old) scheduled for CXR and CT were invited to enroll as participants through an opt-out process. Reports and images were de-identified, processed retrospectively, and AI-flagged discrepant findings were assigned to two lead radiologists, each blinded to patient identifiers and original radiologist. The radiologist's findings for each clinical condition were tallied as a verified discrepancy (true positive) or not (false positive).Results: The missed findings were: 0.02% rib fractures, 0.51% aortic aneurysm, 0.32% pulmonary nodules, 0.92% emphysema, and 0.28% pulmonary embolism. The positive predictive values (PPVs) were: pneumothorax (0%), rib fractures (5.6%), aortic dilatation (43.2%), pulmonary emphysema (46.0%), pulmonary embolus (11.5%), and pulmonary nodules (9.2%). The PPV for pneumothorax was nil owing to lack of available studies that were analysed for outpatient activity.Conclusions: The number of missed findings was far less than generally predicted. The chest algorithms deployed retrospectively were a useful quality tool and AI augmented the radiologists' workflow.Advances in knowledge: The diagnostic accuracy of our radiologists generated missed findings of 0.02% for rib fractures CXR, 0.51% for aortic dilatation, 0.32% for pulmonary nodule, 0.92% for pulmonary emphysema, and 0.28% for pulmonary embolism for CT studies, all retrospectively evaluated with AI used as a quality tool to flag potential missed findings. It is important to account for prevalence of these chest conditions in clinical context and use appropriate clinical thresholds for decision-making, not relying solely on AI.","PeriodicalId":72419,"journal":{"name":"BJR open","volume":"6 1","pages":"tzad009"},"PeriodicalIF":2.1000,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10860529/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BJR open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bjro/tzad009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: This diagnostic study assessed the accuracy of radiologists retrospectively, using the deep learning and natural language processing chest algorithms implemented in Clinical Review version 3.2 for: pneumothorax, rib fractures in digital chest X-ray radiographs (CXR); aortic aneurysm, pulmonary nodules, emphysema, and pulmonary embolism in CT images.

Methods: The study design was double-blind (artificial intelligence [AI] algorithms and humans), retrospective, non-interventional, and at a single NHS Trust. Adult patients (≥18 years old) scheduled for CXR and CT were invited to enroll as participants through an opt-out process. Reports and images were de-identified, processed retrospectively, and AI-flagged discrepant findings were assigned to two lead radiologists, each blinded to patient identifiers and original radiologist. The radiologist's findings for each clinical condition were tallied as a verified discrepancy (true positive) or not (false positive).

Results: The missed findings were: 0.02% rib fractures, 0.51% aortic aneurysm, 0.32% pulmonary nodules, 0.92% emphysema, and 0.28% pulmonary embolism. The positive predictive values (PPVs) were: pneumothorax (0%), rib fractures (5.6%), aortic dilatation (43.2%), pulmonary emphysema (46.0%), pulmonary embolus (11.5%), and pulmonary nodules (9.2%). The PPV for pneumothorax was nil owing to lack of available studies that were analysed for outpatient activity.

Conclusions: The number of missed findings was far less than generally predicted. The chest algorithms deployed retrospectively were a useful quality tool and AI augmented the radiologists' workflow.

Advances in knowledge: The diagnostic accuracy of our radiologists generated missed findings of 0.02% for rib fractures CXR, 0.51% for aortic dilatation, 0.32% for pulmonary nodule, 0.92% for pulmonary emphysema, and 0.28% for pulmonary embolism for CT studies, all retrospectively evaluated with AI used as a quality tool to flag potential missed findings. It is important to account for prevalence of these chest conditions in clinical context and use appropriate clinical thresholds for decision-making, not relying solely on AI.

Abstract Image

查看原文本刊更多论文

利用深度学习和自然语言处理胸部算法验证的阳性预测值，对放射科医生的诊断准确性进行真实世界评估。

目的：这项诊断研究使用《临床评论》3.2 版中的深度学习和自然语言处理胸部算法，对放射科医生在以下方面的准确性进行了回顾性评估：数字 X 光胸片（CXR）中的气胸、肋骨骨折；CT 图像中的主动脉瘤、肺结节、肺气肿和肺栓塞：研究设计为双盲（人工智能[AI]算法和人类）、回顾性、非干预性，在一家英国国家医疗服务系统信托公司进行。成人患者（≥18 岁）在接受 CXR 和 CT 检查时，可通过选择退出程序加入研究。报告和图像被去标识化、回顾性处理，并将人工智能标记的差异结果分配给两名主要放射科医生，每名医生对患者标识符和原始放射科医生都是盲人。放射科医生对每种临床情况的检查结果都被统计为已核实的差异（真阳性）或未核实的差异（假阳性）：漏检结果如下0.02% 肋骨骨折、0.51% 主动脉瘤、0.32% 肺结节、0.92% 肺气肿和 0.28% 肺栓塞。阳性预测值（PPV）为：气胸（0%）、肋骨骨折（5.6%）、主动脉扩张（43.2%）、肺气肿（46.0%）、肺栓塞（11.5%）和肺结节（9.2%）。由于缺乏对门诊活动进行分析的可用研究，气胸的 PPV 为零：结论：漏检结果的数量远低于一般预测。回顾性部署的胸部算法是一种有用的质量工具，人工智能增强了放射医师的工作流程：我们放射科医生的诊断准确率为：CXR 肋骨骨折漏诊率为 0.02%，主动脉扩张漏诊率为 0.51%，肺结节漏诊率为 0.32%，肺气肿漏诊率为 0.92%，CT 检查肺栓塞漏诊率为 0.28%。重要的是要考虑到这些胸部疾病在临床环境中的流行情况，并使用适当的临床阈值进行决策，而不是仅仅依赖人工智能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BJR open

自引率

0.00%

发文量

审稿时长

18 weeks