A Large Language Model to Detect Negated Expressions in Radiology Reports.

Journal of imaging informatics in medicine Pub Date : 2025-06-01 Epub Date: 2024-09-25 DOI:10.1007/s10278-024-01274-9

Yvonne Su, Yonatan B Babore, Charles E Kahn

{"title":"A Large Language Model to Detect Negated Expressions in Radiology Reports.","authors":"Yvonne Su, Yonatan B Babore, Charles E Kahn","doi":"10.1007/s10278-024-01274-9","DOIUrl":null,"url":null,"abstract":"<p><p>Natural language processing (NLP) is crucial to extract information accurately from unstructured text to provide insights for clinical decision-making, quality improvement, and medical research. This study compared the performance of a rule-based NLP system and a medical-domain transformer-based model to detect negated concepts in radiology reports. Using a corpus of 984 de-identified radiology reports from a large U.S.-based academic health system (1000 consecutive reports, excluding 16 duplicates), the investigators compared the rule-based medspaCy system and the Clinical Assertion and Negation Classification Bidirectional Encoder Representations from Transformers (CAN-BERT) system to detect negated expressions of terms from RadLex, the Unified Medical Language System Metathesaurus, and the Radiology Gamuts Ontology. Power analysis determined a sample size of 382 terms to achieve α = 0.05 and β = 0.8 for McNemar's test; based on an estimate of 15% negated terms, 2800 randomly selected terms were annotated manually as negated or not negated. Precision, recall, and F1 of the two models were compared using McNemar's test. Of the 2800 terms, 387 (13.8%) were negated. For negation detection, medspaCy attained a recall of 0.795, precision of 0.356, and F1 of 0.492. CAN-BERT achieved a recall of 0.785, precision of 0.768, and F1 of 0.777. Although recall was not significantly different, CAN-BERT had significantly better precision (χ2 = 304.64; p < 0.001). The transformer-based CAN-BERT model detected negated terms in radiology reports with high precision and recall; its precision significantly exceeded that of the rule-based medspaCy system. Use of this system will improve data extraction from textual reports to support information retrieval, AI model training, and discovery of causal relationships.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"1297-1303"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12092861/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of imaging informatics in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10278-024-01274-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/25 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Natural language processing (NLP) is crucial to extract information accurately from unstructured text to provide insights for clinical decision-making, quality improvement, and medical research. This study compared the performance of a rule-based NLP system and a medical-domain transformer-based model to detect negated concepts in radiology reports. Using a corpus of 984 de-identified radiology reports from a large U.S.-based academic health system (1000 consecutive reports, excluding 16 duplicates), the investigators compared the rule-based medspaCy system and the Clinical Assertion and Negation Classification Bidirectional Encoder Representations from Transformers (CAN-BERT) system to detect negated expressions of terms from RadLex, the Unified Medical Language System Metathesaurus, and the Radiology Gamuts Ontology. Power analysis determined a sample size of 382 terms to achieve α = 0.05 and β = 0.8 for McNemar's test; based on an estimate of 15% negated terms, 2800 randomly selected terms were annotated manually as negated or not negated. Precision, recall, and F1 of the two models were compared using McNemar's test. Of the 2800 terms, 387 (13.8%) were negated. For negation detection, medspaCy attained a recall of 0.795, precision of 0.356, and F1 of 0.492. CAN-BERT achieved a recall of 0.785, precision of 0.768, and F1 of 0.777. Although recall was not significantly different, CAN-BERT had significantly better precision (χ2 = 304.64; p < 0.001). The transformer-based CAN-BERT model detected negated terms in radiology reports with high precision and recall; its precision significantly exceeded that of the rule-based medspaCy system. Use of this system will improve data extraction from textual reports to support information retrieval, AI model training, and discovery of causal relationships.

查看原文本刊更多论文

检测放射学报告中否定表达的大型语言模型。

自然语言处理（NLP）对于从非结构化文本中准确提取信息，为临床决策、质量改进和医学研究提供见解至关重要。本研究比较了基于规则的 NLP 系统和基于医学领域转换器的模型在检测放射学报告中否定概念方面的性能。研究人员使用了来自美国一家大型学术医疗系统的 984 份去标识化放射学报告语料库（1000 份连续报告，不包括 16 份重复报告），比较了基于规则的 medspaCy 系统和临床断言和否定分类双向转换器表示法（CAN-BERT）系统，以检测 RadLex、统一医学语言系统元词库和放射学伽马本体论中术语的否定表达。功率分析确定了 382 个术语的样本量，以实现 McNemar 检验的 α = 0.05 和 β = 0.8；根据 15% 否定术语的估计值，2800 个随机选择的术语被手动标注为否定或未否定。使用 McNemar 检验比较了两个模型的精确度、召回率和 F1。在 2800 个术语中，有 387 个（13.8%）被否定。在否定检测方面，medspaCy 的召回率为 0.795，精确率为 0.356，F1 为 0.492。CAN-BERT 的召回率为 0.785，精确度为 0.768，F1 为 0.777。虽然召回率没有明显差异，但 CAN-BERT 的精确度明显更高（χ2 = 304.64; p

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of imaging informatics in medicine

自引率

0.00%

发文量