{"title":"Assessing Laterality Errors in Radiology: Comparing Generative Artificial Intelligence and Natural Language Processing","authors":"","doi":"10.1016/j.jacr.2024.06.014","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>We compared the performance of generative artificial intelligence (AI) (Augmented Transformer Assisted Radiology Intelligence [ATARI, Microsoft Nuance, Microsoft Corporation, Redmond, Washington]) and natural language processing (NLP) tools for identifying laterality errors in radiology reports and images.</div></div><div><h3>Methods</h3><div>We used an NLP-based (mPower, Microsoft Nuance) tool to identify radiology reports flagged for laterality errors in its Quality Assurance Dashboard. The NLP model detects and highlights laterality mismatches in radiology reports. From an initial pool of 1,124 radiology reports flagged by the NLP for laterality errors, we selected and evaluated 898 reports that encompassed radiography, CT, MRI, and ultrasound modalities to ensure comprehensive coverage. A radiologist reviewed each radiology report to assess if the flagged laterality errors were present (reporting error—true-positive) or absent (NLP error—false-positive). Next, we applied ATARI to 237 radiology reports and images with consecutive NLP true-positive (118 reports) and false-positive (119 reports) laterality errors. We estimated accuracy of NLP and generative AI tools to identify overall and modality-wise laterality errors.</div></div><div><h3>Results</h3><div>Among the 898 NLP-flagged laterality errors, 64% (574 of 898) had NLP errors and 36% (324 of 898) were reporting errors. The text query ATARI feature correctly identified the absence of laterality mismatch (NLP false-positives) with a 97.4% accuracy (115 of 118 reports; 95% confidence interval [CI] = 96.5%-98.3%). Combined vision and text query resulted in 98.3% accuracy (116 of 118 reports or images; 95% CI = 97.6%-99.0%), and query alone had a 98.3% accuracy (116 of 118 images; 95% CI = 97.6%-99.0%).</div></div><div><h3>Conclusion</h3><div>The generative AI-empowered ATARI prototype outperformed the assessed NLP tool for determining true and false laterality errors in radiology reports while enabling an image-based laterality determination. Underlying errors in ATARI text query in complex radiology reports emphasize the need for further improvement in the technology.</div></div>","PeriodicalId":49044,"journal":{"name":"Journal of the American College of Radiology","volume":null,"pages":null},"PeriodicalIF":4.0000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American College of Radiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S154614402400591X","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
We compared the performance of generative artificial intelligence (AI) (Augmented Transformer Assisted Radiology Intelligence [ATARI, Microsoft Nuance, Microsoft Corporation, Redmond, Washington]) and natural language processing (NLP) tools for identifying laterality errors in radiology reports and images.
Methods
We used an NLP-based (mPower, Microsoft Nuance) tool to identify radiology reports flagged for laterality errors in its Quality Assurance Dashboard. The NLP model detects and highlights laterality mismatches in radiology reports. From an initial pool of 1,124 radiology reports flagged by the NLP for laterality errors, we selected and evaluated 898 reports that encompassed radiography, CT, MRI, and ultrasound modalities to ensure comprehensive coverage. A radiologist reviewed each radiology report to assess if the flagged laterality errors were present (reporting error—true-positive) or absent (NLP error—false-positive). Next, we applied ATARI to 237 radiology reports and images with consecutive NLP true-positive (118 reports) and false-positive (119 reports) laterality errors. We estimated accuracy of NLP and generative AI tools to identify overall and modality-wise laterality errors.
Results
Among the 898 NLP-flagged laterality errors, 64% (574 of 898) had NLP errors and 36% (324 of 898) were reporting errors. The text query ATARI feature correctly identified the absence of laterality mismatch (NLP false-positives) with a 97.4% accuracy (115 of 118 reports; 95% confidence interval [CI] = 96.5%-98.3%). Combined vision and text query resulted in 98.3% accuracy (116 of 118 reports or images; 95% CI = 97.6%-99.0%), and query alone had a 98.3% accuracy (116 of 118 images; 95% CI = 97.6%-99.0%).
Conclusion
The generative AI-empowered ATARI prototype outperformed the assessed NLP tool for determining true and false laterality errors in radiology reports while enabling an image-based laterality determination. Underlying errors in ATARI text query in complex radiology reports emphasize the need for further improvement in the technology.
期刊介绍:
The official journal of the American College of Radiology, JACR informs its readers of timely, pertinent, and important topics affecting the practice of diagnostic radiologists, interventional radiologists, medical physicists, and radiation oncologists. In so doing, JACR improves their practices and helps optimize their role in the health care system. By providing a forum for informative, well-written articles on health policy, clinical practice, practice management, data science, and education, JACR engages readers in a dialogue that ultimately benefits patient care.