Accuracy of Large Language Model-based Automatic Calculation of Ovarian-Adnexal Reporting and Data System MRI Scores from Pelvic MRI Reports.

IF 12.1 1区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Radiology Pub Date : 2025-04-01 DOI:10.1148/radiol.241554
Rajesh Bhayana, Ankush Jajodia, Tanya Chawla, Yangqing Deng, Genevieve Bouchard-Fortier, Masoom Haider, Satheesh Krishna
{"title":"Accuracy of Large Language Model-based Automatic Calculation of Ovarian-Adnexal Reporting and Data System MRI Scores from Pelvic MRI Reports.","authors":"Rajesh Bhayana, Ankush Jajodia, Tanya Chawla, Yangqing Deng, Genevieve Bouchard-Fortier, Masoom Haider, Satheesh Krishna","doi":"10.1148/radiol.241554","DOIUrl":null,"url":null,"abstract":"<p><p>Background Ovarian-Adnexal Reporting and Data System (O-RADS) for MRI helps assign malignancy risk, but radiologist adoption is inconsistent. Automatic assignment of O-RADS scores from reports could increase adoption and accuracy. Purpose To evaluate the accuracy of large language models (LLMs), after strategic optimization, for automatically calculating O-RADS scores from reports. Materials and Methods This retrospective single-center study from a large quaternary care cancer center included consecutive gadolinium chelate-enhanced pelvic MRI reports with at least one assigned O-RADS score from July 2021 to October 2023. Reports from January 2018 to October 2019 (before O-RADS MRI implementation) were randomly selected for additional testing. Reference standard O-RADS scores were determined by radiologists interpreting reports. After prompt optimization using a subset of reports, two LLM-based strategies were evaluated: few-shot learning with GPT-4 (version 0613; OpenAI) prompted with O-RADS rules (\"LLM only\") and a hybrid strategy leveraging GPT-4 to classify features fed into a deterministic formula (\"hybrid\"). Accuracy of each model and originally reported scores were calculated and compared using the McNemar test. Results A total of 284 reports from 284 female patients (mean age, 53.2 years ± 16.3 [SD]) with 372 adnexal lesions were included: 10 reports in the training set (16 lesions), 134 reports in the internal test set 1 (173 lesions; 158 O-RADS assigned), and 140 reports in internal test set 2 (183 lesions). For assigning O-RADS MRI scores, the hybrid model accuracy (97%; 168 of 173) outperformed LLM-only model (90%; 155 of 173; <i>P</i> = .006). For lesions with an originally reported O-RADS score, hybrid model accuracy exceeded that of reporting radiologists (97% [153 of 158] vs 88% [139 of 158]; <i>P</i> = .004). Hybrid model also outperformed LLM-only model for 183 lesions from before O-RADS implementation (95% [173 of 183] vs 87% [159 of 183], respectively; <i>P</i> = .01). Conclusion A hybrid LLM-based application, combining LLM feature classification with deterministic elements, accurately assigned O-RADS MRI scores from report descriptions, exceeding both an LLM-only strategy and the original reporting radiologist. © RSNA, 2025 <i>Supplemental material is available for this article.</i></p>","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"315 1","pages":"e241554"},"PeriodicalIF":12.1000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1148/radiol.241554","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Background Ovarian-Adnexal Reporting and Data System (O-RADS) for MRI helps assign malignancy risk, but radiologist adoption is inconsistent. Automatic assignment of O-RADS scores from reports could increase adoption and accuracy. Purpose To evaluate the accuracy of large language models (LLMs), after strategic optimization, for automatically calculating O-RADS scores from reports. Materials and Methods This retrospective single-center study from a large quaternary care cancer center included consecutive gadolinium chelate-enhanced pelvic MRI reports with at least one assigned O-RADS score from July 2021 to October 2023. Reports from January 2018 to October 2019 (before O-RADS MRI implementation) were randomly selected for additional testing. Reference standard O-RADS scores were determined by radiologists interpreting reports. After prompt optimization using a subset of reports, two LLM-based strategies were evaluated: few-shot learning with GPT-4 (version 0613; OpenAI) prompted with O-RADS rules ("LLM only") and a hybrid strategy leveraging GPT-4 to classify features fed into a deterministic formula ("hybrid"). Accuracy of each model and originally reported scores were calculated and compared using the McNemar test. Results A total of 284 reports from 284 female patients (mean age, 53.2 years ± 16.3 [SD]) with 372 adnexal lesions were included: 10 reports in the training set (16 lesions), 134 reports in the internal test set 1 (173 lesions; 158 O-RADS assigned), and 140 reports in internal test set 2 (183 lesions). For assigning O-RADS MRI scores, the hybrid model accuracy (97%; 168 of 173) outperformed LLM-only model (90%; 155 of 173; P = .006). For lesions with an originally reported O-RADS score, hybrid model accuracy exceeded that of reporting radiologists (97% [153 of 158] vs 88% [139 of 158]; P = .004). Hybrid model also outperformed LLM-only model for 183 lesions from before O-RADS implementation (95% [173 of 183] vs 87% [159 of 183], respectively; P = .01). Conclusion A hybrid LLM-based application, combining LLM feature classification with deterministic elements, accurately assigned O-RADS MRI scores from report descriptions, exceeding both an LLM-only strategy and the original reporting radiologist. © RSNA, 2025 Supplemental material is available for this article.

求助全文
约1分钟内获得全文 求助全文
来源期刊
Radiology
Radiology 医学-核医学
CiteScore
35.20
自引率
3.00%
发文量
596
审稿时长
3.6 months
期刊介绍: Published regularly since 1923 by the Radiological Society of North America (RSNA), Radiology has long been recognized as the authoritative reference for the most current, clinically relevant and highest quality research in the field of radiology. Each month the journal publishes approximately 240 pages of peer-reviewed original research, authoritative reviews, well-balanced commentary on significant articles, and expert opinion on new techniques and technologies. Radiology publishes cutting edge and impactful imaging research articles in radiology and medical imaging in order to help improve human health.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信