Assessing Laterality Errors in Radiology: Comparing Generative Artificial Intelligence and Natural Language Processing

IF 4 3区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Journal of the American College of Radiology Pub Date : 2024-10-01 DOI:10.1016/j.jacr.2024.06.014

{"title":"Assessing Laterality Errors in Radiology: Comparing Generative Artificial Intelligence and Natural Language Processing","authors":"","doi":"10.1016/j.jacr.2024.06.014","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>We compared the performance of generative artificial intelligence (AI) (Augmented Transformer Assisted Radiology Intelligence [ATARI, Microsoft Nuance, Microsoft Corporation, Redmond, Washington]) and natural language processing (NLP) tools for identifying laterality errors in radiology reports and images.</div></div><div><h3>Methods</h3><div>We used an NLP-based (mPower, Microsoft Nuance) tool to identify radiology reports flagged for laterality errors in its Quality Assurance Dashboard. The NLP model detects and highlights laterality mismatches in radiology reports. From an initial pool of 1,124 radiology reports flagged by the NLP for laterality errors, we selected and evaluated 898 reports that encompassed radiography, CT, MRI, and ultrasound modalities to ensure comprehensive coverage. A radiologist reviewed each radiology report to assess if the flagged laterality errors were present (reporting error—true-positive) or absent (NLP error—false-positive). Next, we applied ATARI to 237 radiology reports and images with consecutive NLP true-positive (118 reports) and false-positive (119 reports) laterality errors. We estimated accuracy of NLP and generative AI tools to identify overall and modality-wise laterality errors.</div></div><div><h3>Results</h3><div>Among the 898 NLP-flagged laterality errors, 64% (574 of 898) had NLP errors and 36% (324 of 898) were reporting errors. The text query ATARI feature correctly identified the absence of laterality mismatch (NLP false-positives) with a 97.4% accuracy (115 of 118 reports; 95% confidence interval [CI] = 96.5%-98.3%). Combined vision and text query resulted in 98.3% accuracy (116 of 118 reports or images; 95% CI = 97.6%-99.0%), and query alone had a 98.3% accuracy (116 of 118 images; 95% CI = 97.6%-99.0%).</div></div><div><h3>Conclusion</h3><div>The generative AI-empowered ATARI prototype outperformed the assessed NLP tool for determining true and false laterality errors in radiology reports while enabling an image-based laterality determination. Underlying errors in ATARI text query in complex radiology reports emphasize the need for further improvement in the technology.</div></div>","PeriodicalId":49044,"journal":{"name":"Journal of the American College of Radiology","volume":null,"pages":null},"PeriodicalIF":4.0000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American College of Radiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S154614402400591X","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

We compared the performance of generative artificial intelligence (AI) (Augmented Transformer Assisted Radiology Intelligence [ATARI, Microsoft Nuance, Microsoft Corporation, Redmond, Washington]) and natural language processing (NLP) tools for identifying laterality errors in radiology reports and images.

Methods

We used an NLP-based (mPower, Microsoft Nuance) tool to identify radiology reports flagged for laterality errors in its Quality Assurance Dashboard. The NLP model detects and highlights laterality mismatches in radiology reports. From an initial pool of 1,124 radiology reports flagged by the NLP for laterality errors, we selected and evaluated 898 reports that encompassed radiography, CT, MRI, and ultrasound modalities to ensure comprehensive coverage. A radiologist reviewed each radiology report to assess if the flagged laterality errors were present (reporting error—true-positive) or absent (NLP error—false-positive). Next, we applied ATARI to 237 radiology reports and images with consecutive NLP true-positive (118 reports) and false-positive (119 reports) laterality errors. We estimated accuracy of NLP and generative AI tools to identify overall and modality-wise laterality errors.

Results

Among the 898 NLP-flagged laterality errors, 64% (574 of 898) had NLP errors and 36% (324 of 898) were reporting errors. The text query ATARI feature correctly identified the absence of laterality mismatch (NLP false-positives) with a 97.4% accuracy (115 of 118 reports; 95% confidence interval [CI] = 96.5%-98.3%). Combined vision and text query resulted in 98.3% accuracy (116 of 118 reports or images; 95% CI = 97.6%-99.0%), and query alone had a 98.3% accuracy (116 of 118 images; 95% CI = 97.6%-99.0%).

Conclusion

The generative AI-empowered ATARI prototype outperformed the assessed NLP tool for determining true and false laterality errors in radiology reports while enabling an image-based laterality determination. Underlying errors in ATARI text query in complex radiology reports emphasize the need for further improvement in the technology.

查看原文本刊更多论文

评估放射学中的侧影错误：比较生成式人工智能和自然语言处理。

目的：我们比较了生成式人工智能（G-AI，ATARI）和自然语言处理（NLP）工具在识别放射学报告和图像中的侧位错误方面的性能：我们使用基于 NLP（mPower）的工具来识别其 QA 面板中标记为侧位错误的放射学报告。NLP 模型可检测并突出显示放射学报告中的侧位不匹配问题。我们从 NLP 标记为侧位错误的 1124 份放射学报告的初始库中挑选并评估了 898 份报告，其中包括放射摄影、CT、核磁共振成像和超声模式，以确保全面覆盖。一名放射科医生审查了每份放射报告，以评估是否存在标记的侧位错误（报告错误--真阳性）或不存在标记的侧位错误（NLP 错误--假阳性）。接下来，我们将 ATARI 应用于 237 份存在连续 NLP 真阳性（118 份报告）和假阳性（119 份报告）侧位错误的放射学报告和图像。我们估算了NLP和G-AI工具在识别整体和不同模式侧位错误方面的准确性：在898个NLP标记的侧位错误中，64%（574/898）为NLP错误，36%（324/898）为报告错误。文本查询 ATARI 功能能正确识别侧位不匹配（NLP 假阳性），准确率为 97.4%（115/118 份报告；95% CI = 96.5% - 98.3%）。结合视觉和文本查询的准确率为 98.3%（116/118 份报告/图片；95% CI = 97.6% - 99.0%），单独查询的准确率为 98.3%（116/118 张图片；95% CI = 97.6% - 99.0%）：结论：在确定放射学报告中的真假侧位错误方面，生成式人工智能驱动的 ATARI 原型优于经过评估的 NLP 工具，同时还能进行基于图像的侧位确定。在复杂的放射学报告中，ATARI文本查询的潜在错误强调了进一步改进该技术的必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the American College of Radiology RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

6.30

自引率

8.90%

发文量

312

审稿时长

34 days

期刊介绍： The official journal of the American College of Radiology, JACR informs its readers of timely, pertinent, and important topics affecting the practice of diagnostic radiologists, interventional radiologists, medical physicists, and radiation oncologists. In so doing, JACR improves their practices and helps optimize their role in the health care system. By providing a forum for informative, well-written articles on health policy, clinical practice, practice management, data science, and education, JACR engages readers in a dialogue that ultimately benefits patient care.