{"title":"Medical Entity Linking in Low-Resource Settings with Fine-Tuning-Free LLMs.","authors":"Suteera Seeha, Martin Boeker, Luise Modersohn","doi":"10.3233/SHTI251402","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Medical entity linking is an important task in biomedical natural language processing, aiming to align textual mentions of medical concepts with standardized concepts in ontologies. Most existing approaches rely on supervised models or domain-specific embeddings, which require large datasets and significant computational resources.</p><p><strong>Objective: </strong>The objective of this work is (1) to investigate the effectiveness of large language models (LLMs) in improving both candidate generation and disambiguation for medical entity linking through synonym expansion and in-context learning, and (2) to evaluate this approach against traditional string-matching and supervised methods.</p><p><strong>Methods: </strong>We propose a simple yet effective approach that combines string matching with an LLM through in-context learning. Our method avoids fine-tuning and minimizes annotation requirements, making it suitable for low-resource settings. Our system enhances fuzzy string matching by expanding mention spans with LLM-generated synonyms during candidate generation. UMLS entity names, aliases, and synonyms are indexed in Elasticsearch, and candidates are retrieved using both the original span and generated variants. Disambiguation is performed using an LLM with few-shot prompting to select the correct entity from the candidate list.</p><p><strong>Results: </strong>Evaluated on the MedMentions dataset, our approach achieves 56% linking accuracy, outperforming baseline string matching but falling behind supervised learning methods. The candidate generation component reaches 70% recall@5, while the disambiguation step achieves 80% accuracy when the correct entity is among the top five. We also observe that LLM-generated descriptions do not always improve accuracy.</p><p><strong>Conclusion: </strong>Our results demonstrate that LLMs have the potential to support medical entity linking in low-resource settings. Although our method is still outperformed by supervised models, it remains a lightweight alternative, requiring no fine-tuning or a large amount of annotated data. The approach is also adaptable to other domains and ontologies beyond biomedicine due to its flexible and domain-agnostic design.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"245-254"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in health technology and informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI251402","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Medical entity linking is an important task in biomedical natural language processing, aiming to align textual mentions of medical concepts with standardized concepts in ontologies. Most existing approaches rely on supervised models or domain-specific embeddings, which require large datasets and significant computational resources.
Objective: The objective of this work is (1) to investigate the effectiveness of large language models (LLMs) in improving both candidate generation and disambiguation for medical entity linking through synonym expansion and in-context learning, and (2) to evaluate this approach against traditional string-matching and supervised methods.
Methods: We propose a simple yet effective approach that combines string matching with an LLM through in-context learning. Our method avoids fine-tuning and minimizes annotation requirements, making it suitable for low-resource settings. Our system enhances fuzzy string matching by expanding mention spans with LLM-generated synonyms during candidate generation. UMLS entity names, aliases, and synonyms are indexed in Elasticsearch, and candidates are retrieved using both the original span and generated variants. Disambiguation is performed using an LLM with few-shot prompting to select the correct entity from the candidate list.
Results: Evaluated on the MedMentions dataset, our approach achieves 56% linking accuracy, outperforming baseline string matching but falling behind supervised learning methods. The candidate generation component reaches 70% recall@5, while the disambiguation step achieves 80% accuracy when the correct entity is among the top five. We also observe that LLM-generated descriptions do not always improve accuracy.
Conclusion: Our results demonstrate that LLMs have the potential to support medical entity linking in low-resource settings. Although our method is still outperformed by supervised models, it remains a lightweight alternative, requiring no fine-tuning or a large amount of annotated data. The approach is also adaptable to other domains and ontologies beyond biomedicine due to its flexible and domain-agnostic design.