Generalizable and scalable multistage biomedical concept normalization leveraging large language models.

IF 6.1 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods Pub Date : 2025-05-01 DOI:10.1017/rsm.2025.9

Nicholas J Dobbins

{"title":"Generalizable and scalable multistage biomedical concept normalization leveraging large language models.","authors":"Nicholas J Dobbins","doi":"10.1017/rsm.2025.9","DOIUrl":null,"url":null,"abstract":"Background: Biomedical entity normalization is critical to biomedical research because the richness of free-text clinical data, such as progress notes, can often be fully leveraged only after translating words and phrases into structured and coded representations suitable for analysis. Large Language Models (LLMs), in turn, have shown great potential and high performance in a variety of natural language processing (NLP) tasks, but their application for normalization remains understudied.Methods: We applied both proprietary and open-source LLMs in combination with several rule-based normalization systems commonly used in biomedical research. We used a two-step LLM integration approach, (1) using an LLM to generate alternative phrasings of a source utterance, and (2) to prune candidate UMLS concepts, using a variety of prompting methods. We measure results by $F_{\\beta }$, where we favor recall over precision, and F1.Results: We evaluated a total of 5,523 concept terms and text contexts from a publicly available dataset of human-annotated biomedical abstracts. Incorporating GPT-3.5-turbo increased overall $F_{\\beta }$ and F1 in normalization systems +16.5 and +16.2 (OpenAI embeddings), +9.5 and +7.3 (MetaMapLite), +13.9 and +10.9 (QuickUMLS), and +10.5 and +10.3 (BM25), while the open-source Vicuna model achieved +20.2 and +21.7 (OpenAI embeddings), +10.8 and +12.2 (MetaMapLite), +14.7 and +15 (QuickUMLS), and +15.6 and +18.7 (BM25).Conclusions: Existing general-purpose LLMs, both propriety and open-source, can be leveraged to greatly improve normalization performance using existing tools, with no fine-tuning.","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 3","pages":"479-490"},"PeriodicalIF":6.1000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12527512/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Synthesis Methods","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1017/rsm.2025.9","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Biomedical entity normalization is critical to biomedical research because the richness of free-text clinical data, such as progress notes, can often be fully leveraged only after translating words and phrases into structured and coded representations suitable for analysis. Large Language Models (LLMs), in turn, have shown great potential and high performance in a variety of natural language processing (NLP) tasks, but their application for normalization remains understudied.

Methods: We applied both proprietary and open-source LLMs in combination with several rule-based normalization systems commonly used in biomedical research. We used a two-step LLM integration approach, (1) using an LLM to generate alternative phrasings of a source utterance, and (2) to prune candidate UMLS concepts, using a variety of prompting methods. We measure results by $F_{\beta }$, where we favor recall over precision, and F1.

Results: We evaluated a total of 5,523 concept terms and text contexts from a publicly available dataset of human-annotated biomedical abstracts. Incorporating GPT-3.5-turbo increased overall $F_{\beta }$ and F1 in normalization systems +16.5 and +16.2 (OpenAI embeddings), +9.5 and +7.3 (MetaMapLite), +13.9 and +10.9 (QuickUMLS), and +10.5 and +10.3 (BM25), while the open-source Vicuna model achieved +20.2 and +21.7 (OpenAI embeddings), +10.8 and +12.2 (MetaMapLite), +14.7 and +15 (QuickUMLS), and +15.6 and +18.7 (BM25).

Conclusions: Existing general-purpose LLMs, both propriety and open-source, can be leveraged to greatly improve normalization performance using existing tools, with no fine-tuning.

Abstract Image

查看原文本刊更多论文

利用大型语言模型的可泛化和可扩展的多阶段生物医学概念规范化。

背景：生物医学实体规范化对生物医学研究至关重要，因为自由文本临床数据（如进度记录）的丰富性通常只有在将单词和短语翻译成适合分析的结构化和编码表示后才能充分利用。反过来，大型语言模型（llm）在各种自然语言处理（NLP）任务中显示出巨大的潜力和高性能，但它们在规范化方面的应用仍有待研究。方法：我们结合生物医学研究中常用的几种基于规则的规范化系统，应用专有和开源法学硕士。我们使用了两步LLM集成方法，(1)使用LLM生成源话语的替代短语，(2)使用各种提示方法修剪候选的UMLS概念。我们用F_{\beta}$和F1来衡量结果，其中我们倾向于召回率而不是精度。结果：我们从人类注释的生物医学摘要的公开数据集中评估了总共5523个概念术语和文本上下文。采用gpt -3.5 turbo的归一化系统在+16.5和+16.2 （OpenAI嵌入）、+9.5和+7.3 （MetaMapLite）、+13.9和+10.9 （QuickUMLS）、+10.5和+10.3 （BM25）中增加了总体$F_{\beta}$和F1，而开源Vicuna模型在+20.2和+21.7 （OpenAI嵌入）、+10.8和+12.2 （MetaMapLite）、+14.7和+15 （QuickUMLS）、+15.6和+18.7 （BM25）中实现了+20.2和+21.7。结论：现有的通用llm，无论是专有的还是开源的，都可以利用现有的工具来极大地提高规范化性能，而无需进行微调。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Research Synthesis Methods MATHEMATICAL & COMPUTATIONAL BIOLOGYMULTID-MULTIDISCIPLINARY SCIENCES

CiteScore

16.90

自引率

3.10%

发文量

期刊介绍： Research Synthesis Methods is a reputable, peer-reviewed journal that focuses on the development and dissemination of methods for conducting systematic research synthesis. Our aim is to advance the knowledge and application of research synthesis methods across various disciplines. Our journal provides a platform for the exchange of ideas and knowledge related to designing, conducting, analyzing, interpreting, reporting, and applying research synthesis. While research synthesis is commonly practiced in the health and social sciences, our journal also welcomes contributions from other fields to enrich the methodologies employed in research synthesis across scientific disciplines. By bridging different disciplines, we aim to foster collaboration and cross-fertilization of ideas, ultimately enhancing the quality and effectiveness of research synthesis methods. Whether you are a researcher, practitioner, or stakeholder involved in research synthesis, our journal strives to offer valuable insights and practical guidance for your work.