A real-world evaluation of the diagnostic accuracy of radiologists using positive predictive values verified from deep learning and natural language processing chest algorithms deployed retrospectively.

BJR open Pub Date : 2023-12-12 eCollection Date: 2024-01-01 DOI:10.1093/bjro/tzad009
Bahadar S Bhatia, John F Morlese, Sarah Yusuf, Yiting Xie, Bob Schallhorn, David Gruen
{"title":"A real-world evaluation of the diagnostic accuracy of radiologists using positive predictive values verified from deep learning and natural language processing chest algorithms deployed retrospectively.","authors":"Bahadar S Bhatia, John F Morlese, Sarah Yusuf, Yiting Xie, Bob Schallhorn, David Gruen","doi":"10.1093/bjro/tzad009","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>This diagnostic study assessed the accuracy of radiologists retrospectively, using the deep learning and natural language processing chest algorithms implemented in Clinical Review version 3.2 for: pneumothorax, rib fractures in digital chest X-ray radiographs (CXR); aortic aneurysm, pulmonary nodules, emphysema, and pulmonary embolism in CT images.</p><p><strong>Methods: </strong>The study design was double-blind (artificial intelligence [AI] algorithms and humans), retrospective, non-interventional, and at a single NHS Trust. Adult patients (≥18 years old) scheduled for CXR and CT were invited to enroll as participants through an opt-out process. Reports and images were de-identified, processed retrospectively, and AI-flagged discrepant findings were assigned to two lead radiologists, each blinded to patient identifiers and original radiologist. The radiologist's findings for each clinical condition were tallied as a verified discrepancy (true positive) or not (false positive).</p><p><strong>Results: </strong>The missed findings were: 0.02% rib fractures, 0.51% aortic aneurysm, 0.32% pulmonary nodules, 0.92% emphysema, and 0.28% pulmonary embolism. The positive predictive values (PPVs) were: pneumothorax (0%), rib fractures (5.6%), aortic dilatation (43.2%), pulmonary emphysema (46.0%), pulmonary embolus (11.5%), and pulmonary nodules (9.2%). The PPV for pneumothorax was nil owing to lack of available studies that were analysed for outpatient activity.</p><p><strong>Conclusions: </strong>The number of missed findings was far less than generally predicted. The chest algorithms deployed retrospectively were a useful quality tool and AI augmented the radiologists' workflow.</p><p><strong>Advances in knowledge: </strong>The diagnostic accuracy of our radiologists generated missed findings of 0.02% for rib fractures CXR, 0.51% for aortic dilatation, 0.32% for pulmonary nodule, 0.92% for pulmonary emphysema, and 0.28% for pulmonary embolism for CT studies, all retrospectively evaluated with AI used as a quality tool to flag potential missed findings. It is important to account for prevalence of these chest conditions in clinical context and use appropriate clinical thresholds for decision-making, not relying solely on AI.</p>","PeriodicalId":72419,"journal":{"name":"BJR open","volume":"6 1","pages":"tzad009"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10860529/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BJR open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bjro/tzad009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: This diagnostic study assessed the accuracy of radiologists retrospectively, using the deep learning and natural language processing chest algorithms implemented in Clinical Review version 3.2 for: pneumothorax, rib fractures in digital chest X-ray radiographs (CXR); aortic aneurysm, pulmonary nodules, emphysema, and pulmonary embolism in CT images.

Methods: The study design was double-blind (artificial intelligence [AI] algorithms and humans), retrospective, non-interventional, and at a single NHS Trust. Adult patients (≥18 years old) scheduled for CXR and CT were invited to enroll as participants through an opt-out process. Reports and images were de-identified, processed retrospectively, and AI-flagged discrepant findings were assigned to two lead radiologists, each blinded to patient identifiers and original radiologist. The radiologist's findings for each clinical condition were tallied as a verified discrepancy (true positive) or not (false positive).

Results: The missed findings were: 0.02% rib fractures, 0.51% aortic aneurysm, 0.32% pulmonary nodules, 0.92% emphysema, and 0.28% pulmonary embolism. The positive predictive values (PPVs) were: pneumothorax (0%), rib fractures (5.6%), aortic dilatation (43.2%), pulmonary emphysema (46.0%), pulmonary embolus (11.5%), and pulmonary nodules (9.2%). The PPV for pneumothorax was nil owing to lack of available studies that were analysed for outpatient activity.

Conclusions: The number of missed findings was far less than generally predicted. The chest algorithms deployed retrospectively were a useful quality tool and AI augmented the radiologists' workflow.

Advances in knowledge: The diagnostic accuracy of our radiologists generated missed findings of 0.02% for rib fractures CXR, 0.51% for aortic dilatation, 0.32% for pulmonary nodule, 0.92% for pulmonary emphysema, and 0.28% for pulmonary embolism for CT studies, all retrospectively evaluated with AI used as a quality tool to flag potential missed findings. It is important to account for prevalence of these chest conditions in clinical context and use appropriate clinical thresholds for decision-making, not relying solely on AI.

利用深度学习和自然语言处理胸部算法验证的阳性预测值,对放射科医生的诊断准确性进行真实世界评估。
目的:这项诊断研究使用《临床评论》3.2 版中的深度学习和自然语言处理胸部算法,对放射科医生在以下方面的准确性进行了回顾性评估:数字 X 光胸片(CXR)中的气胸、肋骨骨折;CT 图像中的主动脉瘤、肺结节、肺气肿和肺栓塞:研究设计为双盲(人工智能[AI]算法和人类)、回顾性、非干预性,在一家英国国家医疗服务系统信托公司进行。成人患者(≥18 岁)在接受 CXR 和 CT 检查时,可通过选择退出程序加入研究。报告和图像被去标识化、回顾性处理,并将人工智能标记的差异结果分配给两名主要放射科医生,每名医生对患者标识符和原始放射科医生都是盲人。放射科医生对每种临床情况的检查结果都被统计为已核实的差异(真阳性)或未核实的差异(假阳性):漏检结果如下0.02% 肋骨骨折、0.51% 主动脉瘤、0.32% 肺结节、0.92% 肺气肿和 0.28% 肺栓塞。阳性预测值(PPV)为:气胸(0%)、肋骨骨折(5.6%)、主动脉扩张(43.2%)、肺气肿(46.0%)、肺栓塞(11.5%)和肺结节(9.2%)。由于缺乏对门诊活动进行分析的可用研究,气胸的 PPV 为零:结论:漏检结果的数量远低于一般预测。回顾性部署的胸部算法是一种有用的质量工具,人工智能增强了放射医师的工作流程:我们放射科医生的诊断准确率为:CXR 肋骨骨折漏诊率为 0.02%,主动脉扩张漏诊率为 0.51%,肺结节漏诊率为 0.32%,肺气肿漏诊率为 0.92%,CT 检查肺栓塞漏诊率为 0.28%。重要的是要考虑到这些胸部疾病在临床环境中的流行情况,并使用适当的临床阈值进行决策,而不是仅仅依赖人工智能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
审稿时长
18 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信