Implementing a Resource-Light and Low-Code Large Language Model System for Information Extraction from Mammography Reports: A Pilot Study.

Fabio Dennstädt, Simon Fauser, Nikola Cihoric, Max Schmerder, Paolo Lombardo, Grazia Maria Cereghetti, Sandro von Däniken, Thomas Minder, Jaro Meyer, Lawrence Chiang, Roberto Gaio, Luc Lerch, Irina Filchenko, Daniel Reichenpfader, Kerstin Denecke, Caslav Vojvodic, Igor Tatalovic, André Sander, Janna Hastings, Daniel M Aebersold, Hendrik von Tengg-Kobligk, Knud Nairz
{"title":"Implementing a Resource-Light and Low-Code Large Language Model System for Information Extraction from Mammography Reports: A Pilot Study.","authors":"Fabio Dennstädt, Simon Fauser, Nikola Cihoric, Max Schmerder, Paolo Lombardo, Grazia Maria Cereghetti, Sandro von Däniken, Thomas Minder, Jaro Meyer, Lawrence Chiang, Roberto Gaio, Luc Lerch, Irina Filchenko, Daniel Reichenpfader, Kerstin Denecke, Caslav Vojvodic, Igor Tatalovic, André Sander, Janna Hastings, Daniel M Aebersold, Hendrik von Tengg-Kobligk, Knud Nairz","doi":"10.1007/s10278-025-01659-4","DOIUrl":null,"url":null,"abstract":"<p><p>Large language models (LLMs) have been successfully used for data extraction from free-text radiology reports. Most current studies were conducted with LLMs accessed via an application programming interface (API). We evaluated the feasibility of using open-source LLMs, deployed on limited local hardware resources for data extraction from free-text mammography reports, using a common data element (CDE)-based structure. Seventy-nine CDEs were defined by an interdisciplinary expert panel, reflecting real-world reporting practice. Sixty-one reports were classified by two independent researchers to establish ground truth. Five different open-source LLMs deployable on a single GPU were used for data extraction using the general-classifier Python package. Extractions were performed for five different prompt approaches with calculation of overall accuracy, micro-recall and micro-F1. Additional analyses were conducted using thresholds for the relative probability of classifications. High inter-rater agreement was observed between manual classifiers (Cohen's kappa 0.83). Using default prompts, the LLMs achieved accuracies of 59.2-72.9%. Chain-of-thought prompting yielded mixed results, while few-shot prompting led to decreased accuracy. Adaptation of the default prompts to precisely define classification tasks improved performance for all models, with accuracies of 64.7-85.3%. Setting certainty thresholds further improved accuracies to > 90% but reduced the coverage rate to < 50%. Locally deployed open-source LLMs can effectively extract information from mammography reports, maintaining compatibility with limited computational resources. Selection and evaluation of the model and prompting strategy are critical. Clear, task-specific instructions appear crucial for high performance. Using a CDE-based framework provides clear semantics and structure for the data extraction.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of imaging informatics in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10278-025-01659-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Large language models (LLMs) have been successfully used for data extraction from free-text radiology reports. Most current studies were conducted with LLMs accessed via an application programming interface (API). We evaluated the feasibility of using open-source LLMs, deployed on limited local hardware resources for data extraction from free-text mammography reports, using a common data element (CDE)-based structure. Seventy-nine CDEs were defined by an interdisciplinary expert panel, reflecting real-world reporting practice. Sixty-one reports were classified by two independent researchers to establish ground truth. Five different open-source LLMs deployable on a single GPU were used for data extraction using the general-classifier Python package. Extractions were performed for five different prompt approaches with calculation of overall accuracy, micro-recall and micro-F1. Additional analyses were conducted using thresholds for the relative probability of classifications. High inter-rater agreement was observed between manual classifiers (Cohen's kappa 0.83). Using default prompts, the LLMs achieved accuracies of 59.2-72.9%. Chain-of-thought prompting yielded mixed results, while few-shot prompting led to decreased accuracy. Adaptation of the default prompts to precisely define classification tasks improved performance for all models, with accuracies of 64.7-85.3%. Setting certainty thresholds further improved accuracies to > 90% but reduced the coverage rate to < 50%. Locally deployed open-source LLMs can effectively extract information from mammography reports, maintaining compatibility with limited computational resources. Selection and evaluation of the model and prompting strategy are critical. Clear, task-specific instructions appear crucial for high performance. Using a CDE-based framework provides clear semantics and structure for the data extraction.

实现一个轻资源和低代码的大语言模型系统,用于从乳房x光检查报告中提取信息:一项试点研究。
大型语言模型(llm)已经成功地用于从自由文本放射学报告中提取数据。目前大多数研究都是通过应用程序编程接口(API)访问llm进行的。我们评估了使用开源llm的可行性,部署在有限的本地硬件资源上,使用基于公共数据元素(CDE)的结构从自由文本乳房x线检查报告中提取数据。79个cde由一个跨学科专家小组定义,反映了现实世界的报告实践。两名独立研究人员对61份报告进行了分类,以确定基本事实。使用通用分类器Python包,使用可部署在单个GPU上的五个不同的开源llm进行数据提取。对五种不同的提示方法进行提取,并计算总体准确率、微召回率和微f1。使用分类相对概率的阈值进行了额外的分析。人工分类器之间高度一致(Cohen’s kappa 0.83)。使用默认提示,llm的准确率达到59.2-72.9%。思维链提示产生的结果好坏参半,而较少的提示导致准确性下降。调整默认提示以精确定义分类任务,提高了所有模型的性能,准确率为64.7-85.3%。设置确定性阈值进一步将准确率提高到90%,但将覆盖率降低到
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信