Louis Lind Plesner, Felix C Müller, Mathias W Brejnebøl, Christian Hedeager Krag, Lene C Laustrup, Finn Rasmussen, Olav Wendelboe Nielsen, Mikael Boesen, Michael B Andersen
求助PDF
{"title":"Using AI to Identify Unremarkable Chest Radiographs for Automatic Reporting.","authors":"Louis Lind Plesner, Felix C Müller, Mathias W Brejnebøl, Christian Hedeager Krag, Lene C Laustrup, Finn Rasmussen, Olav Wendelboe Nielsen, Mikael Boesen, Michael B Andersen","doi":"10.1148/radiol.240272","DOIUrl":null,"url":null,"abstract":"<p><p>Background Radiology practices have a high volume of unremarkable chest radiographs and artificial intelligence (AI) could possibly improve workflow by providing an automatic report. Purpose To estimate the proportion of unremarkable chest radiographs, where AI can correctly exclude pathology (ie, specificity) without increasing diagnostic errors. Materials and Methods In this retrospective study, consecutive chest radiographs in unique adult patients (≥18 years of age) were obtained January 1-12, 2020, at four Danish hospitals. Exclusion criteria included insufficient radiology reports or AI output error. Two thoracic radiologists, who were blinded to AI output, labeled chest radiographs as \"remarkable\" or \"unremarkable\" based on predefined unremarkable findings (reference standard). Radiology reports were classified similarly. A commercial AI tool was adapted to output a chest radiograph \"remarkableness\" probability, which was used to calculate specificity at different AI sensitivities. Chest radiographs with missed findings by AI and/or the radiology report were graded by one thoracic radiologist as critical, clinically significant, or clinically insignificant. Paired proportions were compared using the McNemar test. Results A total of 1961 patients were included (median age, 72 years [IQR, 58-81 years]; 993 female), with one chest radiograph per patient. The reference standard labeled 1231 of 1961 chest radiographs (62.8%) as remarkable and 730 of 1961 (37.2%) as unremarkable. At 99.9%, 99.0%, and 98.0% sensitivity, the AI had a specificity of 24.5% (179 of 730 radiographs [95% CI: 21, 28]), 47.1% (344 of 730 radiographs [95% CI: 43, 51]), and 52.7% (385 of 730 radiographs [95% CI: 49, 56]), respectively. With the AI fixed to have a similar sensitivity as radiology reports (87.2%), the missed findings of AI and reports had 2.2% (27 of 1231 radiographs) and 1.1% (14 of 1231 radiographs) classified as critical (<i>P</i> = .01), 4.1% (51 of 1231 radiographs) and 3.6% (44 of 1231 radiographs) classified as clinically significant (<i>P</i> = .46), and 6.5% (80 of 1231) and 8.1% (100 of 1231) classified as clinically insignificant (<i>P</i> = .11), respectively. At sensitivities greater than or equal to 95.4%, the AI tool exhibited less than or equal to 1.1% critical misses. Conclusion A commercial AI tool used off-label could correctly exclude pathology in 24.5%-52.7% of all unremarkable chest radiographs at greater than or equal to 98% sensitivity. The AI had equal or lower rates of critical misses than radiology reports at sensitivities greater than or equal to 95.4%. These results should be confirmed in a prospective study. © RSNA, 2024 <i>Supplemental material is available for this article.</i> See also the editorial by Yoon and Hwang in this issue.</p>","PeriodicalId":20896,"journal":{"name":"Radiology","volume":null,"pages":null},"PeriodicalIF":12.1000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1148/radiol.240272","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
引用
批量引用
Abstract
Background Radiology practices have a high volume of unremarkable chest radiographs and artificial intelligence (AI) could possibly improve workflow by providing an automatic report. Purpose To estimate the proportion of unremarkable chest radiographs, where AI can correctly exclude pathology (ie, specificity) without increasing diagnostic errors. Materials and Methods In this retrospective study, consecutive chest radiographs in unique adult patients (≥18 years of age) were obtained January 1-12, 2020, at four Danish hospitals. Exclusion criteria included insufficient radiology reports or AI output error. Two thoracic radiologists, who were blinded to AI output, labeled chest radiographs as "remarkable" or "unremarkable" based on predefined unremarkable findings (reference standard). Radiology reports were classified similarly. A commercial AI tool was adapted to output a chest radiograph "remarkableness" probability, which was used to calculate specificity at different AI sensitivities. Chest radiographs with missed findings by AI and/or the radiology report were graded by one thoracic radiologist as critical, clinically significant, or clinically insignificant. Paired proportions were compared using the McNemar test. Results A total of 1961 patients were included (median age, 72 years [IQR, 58-81 years]; 993 female), with one chest radiograph per patient. The reference standard labeled 1231 of 1961 chest radiographs (62.8%) as remarkable and 730 of 1961 (37.2%) as unremarkable. At 99.9%, 99.0%, and 98.0% sensitivity, the AI had a specificity of 24.5% (179 of 730 radiographs [95% CI: 21, 28]), 47.1% (344 of 730 radiographs [95% CI: 43, 51]), and 52.7% (385 of 730 radiographs [95% CI: 49, 56]), respectively. With the AI fixed to have a similar sensitivity as radiology reports (87.2%), the missed findings of AI and reports had 2.2% (27 of 1231 radiographs) and 1.1% (14 of 1231 radiographs) classified as critical (P = .01), 4.1% (51 of 1231 radiographs) and 3.6% (44 of 1231 radiographs) classified as clinically significant (P = .46), and 6.5% (80 of 1231) and 8.1% (100 of 1231) classified as clinically insignificant (P = .11), respectively. At sensitivities greater than or equal to 95.4%, the AI tool exhibited less than or equal to 1.1% critical misses. Conclusion A commercial AI tool used off-label could correctly exclude pathology in 24.5%-52.7% of all unremarkable chest radiographs at greater than or equal to 98% sensitivity. The AI had equal or lower rates of critical misses than radiology reports at sensitivities greater than or equal to 95.4%. These results should be confirmed in a prospective study. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Yoon and Hwang in this issue.
利用人工智能识别无异常胸片,以便自动报告。
背景 放射科有大量无异常胸片,而人工智能 (AI) 可以通过提供自动报告改善工作流程。目的 估计人工智能能正确排除病理(即特异性)而不增加诊断错误的无异常胸片的比例。材料和方法 在这项回顾性研究中,丹麦的四家医院于 2020 年 1 月 1 日至 12 日为特定的成年患者(年龄≥18 岁)连续拍摄了胸片。排除标准包括放射学报告不足或 AI 输出错误。两名胸部放射科医生对人工智能输出结果视而不见,他们根据预先定义的无异常发现(参考标准)将胸片标记为 "显著 "或 "无异常"。放射学报告也进行了类似的分类。我们对商业人工智能工具进行了调整,以输出胸片 "显著性 "概率,并根据不同的人工智能灵敏度计算特异性。人工智能和/或放射学报告漏检的胸片由一名胸部放射科医生分级为危重、有临床意义或无临床意义。使用 McNemar 检验比较配对比例。结果 共纳入 1961 名患者(中位年龄 72 岁 [IQR,58-81 岁];993 名女性),每名患者一张胸片。参考标准将 1961 张胸片中的 1231 张(62.8%)标记为有特征,将 1961 张胸片中的 730 张(37.2%)标记为无特征。在灵敏度为 99.9%、99.0% 和 98.0% 时,AI 的特异性分别为 24.5%(730 张照片中的 179 张[95% CI:21, 28])、47.1%(730 张照片中的 344 张[95% CI:43, 51])和 52.7%(730 张照片中的 385 张[95% CI:49, 56])。在 AI 的灵敏度与放射学报告(87.2%)相似的情况下,AI 和报告的漏检结果分别有 2.2%(1231 张 X 光片中的 27 张)和 1.1%(1231 张 X 光片中的 14 张)被归类为危重(P = .01)、4.1%(1231 张 X 光片中的 51 张)和 3.6%(1231 张 X 光片中的 44 张)被归类为有临床意义(P = .46),6.5%(1231 张 X 光片中的 80 张)和 8.1%(1231 张 X 光片中的 100 张)被归类为无临床意义(P = .11)。在灵敏度大于或等于 95.4% 的情况下,人工智能工具的临界漏检率小于或等于 1.1%。结论 非标示使用的商用 AI 工具可正确排除 24.5%-52.7% 的无异常胸片中的病变,灵敏度大于或等于 98%。在灵敏度大于或等于 95.4% 的情况下,人工智能的重大漏诊率与放射学报告相当或更低。这些结果应在前瞻性研究中得到证实。RSNA, 2024 这篇文章有补充材料。另请参阅本期 Yoon 和 Hwang 的社论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。