{"title":"Generalizable and scalable multistage biomedical concept normalization leveraging large language models.","authors":"Nicholas J Dobbins","doi":"10.1017/rsm.2025.9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Biomedical entity normalization is critical to biomedical research because the richness of free-text clinical data, such as progress notes, can often be fully leveraged only after translating words and phrases into structured and coded representations suitable for analysis. Large Language Models (LLMs), in turn, have shown great potential and high performance in a variety of natural language processing (NLP) tasks, but their application for normalization remains understudied.</p><p><strong>Methods: </strong>We applied both proprietary and open-source LLMs in combination with several rule-based normalization systems commonly used in biomedical research. We used a two-step LLM integration approach, (1) using an LLM to generate alternative phrasings of a source utterance, and (2) to prune candidate UMLS concepts, using a variety of prompting methods. We measure results by $F_{\\beta }$, where we favor recall over precision, and F1.</p><p><strong>Results: </strong>We evaluated a total of 5,523 concept terms and text contexts from a publicly available dataset of human-annotated biomedical abstracts. Incorporating GPT-3.5-turbo increased overall $F_{\\beta }$ and F1 in normalization systems +16.5 and +16.2 (OpenAI embeddings), +9.5 and +7.3 (MetaMapLite), +13.9 and +10.9 (QuickUMLS), and +10.5 and +10.3 (BM25), while the open-source Vicuna model achieved +20.2 and +21.7 (OpenAI embeddings), +10.8 and +12.2 (MetaMapLite), +14.7 and +15 (QuickUMLS), and +15.6 and +18.7 (BM25).</p><p><strong>Conclusions: </strong>Existing general-purpose LLMs, both propriety and open-source, can be leveraged to greatly improve normalization performance using existing tools, with no fine-tuning.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 3","pages":"479-490"},"PeriodicalIF":6.1000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12527512/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Synthesis Methods","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1017/rsm.2025.9","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Biomedical entity normalization is critical to biomedical research because the richness of free-text clinical data, such as progress notes, can often be fully leveraged only after translating words and phrases into structured and coded representations suitable for analysis. Large Language Models (LLMs), in turn, have shown great potential and high performance in a variety of natural language processing (NLP) tasks, but their application for normalization remains understudied.
Methods: We applied both proprietary and open-source LLMs in combination with several rule-based normalization systems commonly used in biomedical research. We used a two-step LLM integration approach, (1) using an LLM to generate alternative phrasings of a source utterance, and (2) to prune candidate UMLS concepts, using a variety of prompting methods. We measure results by $F_{\beta }$, where we favor recall over precision, and F1.
Results: We evaluated a total of 5,523 concept terms and text contexts from a publicly available dataset of human-annotated biomedical abstracts. Incorporating GPT-3.5-turbo increased overall $F_{\beta }$ and F1 in normalization systems +16.5 and +16.2 (OpenAI embeddings), +9.5 and +7.3 (MetaMapLite), +13.9 and +10.9 (QuickUMLS), and +10.5 and +10.3 (BM25), while the open-source Vicuna model achieved +20.2 and +21.7 (OpenAI embeddings), +10.8 and +12.2 (MetaMapLite), +14.7 and +15 (QuickUMLS), and +15.6 and +18.7 (BM25).
Conclusions: Existing general-purpose LLMs, both propriety and open-source, can be leveraged to greatly improve normalization performance using existing tools, with no fine-tuning.
期刊介绍:
Research Synthesis Methods is a reputable, peer-reviewed journal that focuses on the development and dissemination of methods for conducting systematic research synthesis. Our aim is to advance the knowledge and application of research synthesis methods across various disciplines.
Our journal provides a platform for the exchange of ideas and knowledge related to designing, conducting, analyzing, interpreting, reporting, and applying research synthesis. While research synthesis is commonly practiced in the health and social sciences, our journal also welcomes contributions from other fields to enrich the methodologies employed in research synthesis across scientific disciplines.
By bridging different disciplines, we aim to foster collaboration and cross-fertilization of ideas, ultimately enhancing the quality and effectiveness of research synthesis methods. Whether you are a researcher, practitioner, or stakeholder involved in research synthesis, our journal strives to offer valuable insights and practical guidance for your work.