Open-Weight Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports: Assessment of Approaches and Parameters.

IF 8.1 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Mohamed Sobhi Jabal, Pranav Warman, Jikai Zhang, Kartikeye Gupta, Ayush Jain, Maciej Mazurowski, Walter Wiggins, Kirti Magudia, Evan Calabrese
{"title":"Open-Weight Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports: Assessment of Approaches and Parameters.","authors":"Mohamed Sobhi Jabal, Pranav Warman, Jikai Zhang, Kartikeye Gupta, Ayush Jain, Maciej Mazurowski, Walter Wiggins, Kirti Magudia, Evan Calabrese","doi":"10.1148/ryai.240551","DOIUrl":null,"url":null,"abstract":"<p><p><i>\"Just Accepted\" papers have undergone full peer review and have been accepted for publication in <i>Radiology: Artificial Intelligence</i>. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content.</i> Purpose To develop and evaluate an automated system for extracting structured clinical information from unstructured radiology and pathology reports using open-weights language models (LMs) and retrieval augmented generation (RAG) and to assess the effects of model configuration variables on extraction performance. Materials and Methods This retrospective study utilized two datasets: 7,294 radiology reports annotated for Brain Tumor Reporting and Data System (BT-RADS) scores and 2,154 pathology reports annotated for <i>IDH</i> mutation status (January 2017 to July 2021). An automated pipeline was developed to benchmark the performance of various LMs and RAG configurations for structured data extraction accuracy from reports. The impact of model size, quantization, prompting strategies, output formatting, and inference parameters on model accuracy was systematically evaluated. Results The best performing models achieved up to 98% accuracy in extracting BT-RADS scores from radiology reports and over 90% for <i>IDH</i> mutation status extraction from pathology reports. The best model was medical finetuned llama3. Larger, newer, and domain fine-tuned models consistently outperformed older and smaller models (mean accuracy, 86% versus 75%; <i>P</i> < .001). Model quantization had minimal impact on performance. Few-shot prompting significantly improved accuracy (mean increase: 32% ± 32%, <i>P</i> = .02). RAG improved performance for complex pathology reports +48% ± 11% (<i>P</i> = .001), but not for shorter radiology reports-8% ± 31% (<i>P</i> = .39). Conclusion This study demonstrates the potential of open LMs in automated extraction of structured clinical data from unstructured clinical reports with local privacy-preserving application. Careful model selection, prompt engineering, and semiautomated optimization using annotated data are critical for optimal performance. ©RSNA, 2025.</p>","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":" ","pages":"e240551"},"PeriodicalIF":8.1000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology-Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1148/ryai.240551","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To develop and evaluate an automated system for extracting structured clinical information from unstructured radiology and pathology reports using open-weights language models (LMs) and retrieval augmented generation (RAG) and to assess the effects of model configuration variables on extraction performance. Materials and Methods This retrospective study utilized two datasets: 7,294 radiology reports annotated for Brain Tumor Reporting and Data System (BT-RADS) scores and 2,154 pathology reports annotated for IDH mutation status (January 2017 to July 2021). An automated pipeline was developed to benchmark the performance of various LMs and RAG configurations for structured data extraction accuracy from reports. The impact of model size, quantization, prompting strategies, output formatting, and inference parameters on model accuracy was systematically evaluated. Results The best performing models achieved up to 98% accuracy in extracting BT-RADS scores from radiology reports and over 90% for IDH mutation status extraction from pathology reports. The best model was medical finetuned llama3. Larger, newer, and domain fine-tuned models consistently outperformed older and smaller models (mean accuracy, 86% versus 75%; P < .001). Model quantization had minimal impact on performance. Few-shot prompting significantly improved accuracy (mean increase: 32% ± 32%, P = .02). RAG improved performance for complex pathology reports +48% ± 11% (P = .001), but not for shorter radiology reports-8% ± 31% (P = .39). Conclusion This study demonstrates the potential of open LMs in automated extraction of structured clinical data from unstructured clinical reports with local privacy-preserving application. Careful model selection, prompt engineering, and semiautomated optimization using annotated data are critical for optimal performance. ©RSNA, 2025.

从诊断报告中自动提取结构化数据的开放权重语言模型和检索增强生成:方法和参数的评估。
“刚刚接受”的论文经过了全面的同行评审,并已被接受发表在《放射学:人工智能》杂志上。这篇文章将经过编辑,布局和校样审查,然后在其最终版本出版。请注意,在最终编辑文章的制作过程中,可能会发现可能影响内容的错误。目的:开发和评估一个使用开放权重语言模型(LMs)和检索增强生成(RAG)从非结构化放射学和病理报告中提取结构化临床信息的自动化系统,并评估模型配置变量对提取性能的影响。材料和方法本回顾性研究使用了两个数据集:7,294份脑肿瘤报告和数据系统(BT-RADS)评分注释的放射学报告和2,154份IDH突变状态注释的病理报告(2017年1月至2021年7月)。开发了一个自动化管道来对各种lm和RAG配置的性能进行基准测试,以确保从报告中提取结构化数据的准确性。系统地评估了模型大小、量化、提示策略、输出格式和推理参数对模型精度的影响。结果表现最好的模型在从放射学报告中提取BT-RADS评分方面的准确率高达98%,在从病理报告中提取IDH突变状态方面的准确率超过90%。最好的模型是医疗微调羊驼。较大的、较新的和领域微调的模型始终优于较旧的和较小的模型(平均准确率,86%对75%;P < 0.001)。模型量化对性能的影响最小。少针提示显著提高准确率(平均提高32%±32%,P = 0.02)。对于复杂的病理报告,RAG提高了48%±11% (P = .001),但对于较短的放射学报告,RAG提高了8%±31% (P = .39)。结论本研究展示了开放式LMs在从非结构化临床报告中自动提取结构化临床数据以及本地隐私保护应用方面的潜力。仔细的模型选择、快速的工程设计和使用带注释的数据的半自动优化是实现最佳性能的关键。©RSNA, 2025年。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
16.20
自引率
1.00%
发文量
0
期刊介绍: Radiology: Artificial Intelligence is a bi-monthly publication that focuses on the emerging applications of machine learning and artificial intelligence in the field of imaging across various disciplines. This journal is available online and accepts multiple manuscript types, including Original Research, Technical Developments, Data Resources, Review articles, Editorials, Letters to the Editor and Replies, Special Reports, and AI in Brief.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信