Medical Entity Linking in Low-Resource Settings with Fine-Tuning-Free LLMs.

Studies in health technology and informatics Pub Date : 2025-09-03 DOI:10.3233/SHTI251402

Suteera Seeha, Martin Boeker, Luise Modersohn

{"title":"Medical Entity Linking in Low-Resource Settings with Fine-Tuning-Free LLMs.","authors":"Suteera Seeha, Martin Boeker, Luise Modersohn","doi":"10.3233/SHTI251402","DOIUrl":null,"url":null,"abstract":"Introduction: Medical entity linking is an important task in biomedical natural language processing, aiming to align textual mentions of medical concepts with standardized concepts in ontologies. Most existing approaches rely on supervised models or domain-specific embeddings, which require large datasets and significant computational resources.Objective: The objective of this work is (1) to investigate the effectiveness of large language models (LLMs) in improving both candidate generation and disambiguation for medical entity linking through synonym expansion and in-context learning, and (2) to evaluate this approach against traditional string-matching and supervised methods.Methods: We propose a simple yet effective approach that combines string matching with an LLM through in-context learning. Our method avoids fine-tuning and minimizes annotation requirements, making it suitable for low-resource settings. Our system enhances fuzzy string matching by expanding mention spans with LLM-generated synonyms during candidate generation. UMLS entity names, aliases, and synonyms are indexed in Elasticsearch, and candidates are retrieved using both the original span and generated variants. Disambiguation is performed using an LLM with few-shot prompting to select the correct entity from the candidate list.Results: Evaluated on the MedMentions dataset, our approach achieves 56% linking accuracy, outperforming baseline string matching but falling behind supervised learning methods. The candidate generation component reaches 70% recall@5, while the disambiguation step achieves 80% accuracy when the correct entity is among the top five. We also observe that LLM-generated descriptions do not always improve accuracy.Conclusion: Our results demonstrate that LLMs have the potential to support medical entity linking in low-resource settings. Although our method is still outperformed by supervised models, it remains a lightweight alternative, requiring no fine-tuning or a large amount of annotated data. The approach is also adaptable to other domains and ontologies beyond biomedicine due to its flexible and domain-agnostic design.","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"245-254"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in health technology and informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI251402","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Medical entity linking is an important task in biomedical natural language processing, aiming to align textual mentions of medical concepts with standardized concepts in ontologies. Most existing approaches rely on supervised models or domain-specific embeddings, which require large datasets and significant computational resources.

Objective: The objective of this work is (1) to investigate the effectiveness of large language models (LLMs) in improving both candidate generation and disambiguation for medical entity linking through synonym expansion and in-context learning, and (2) to evaluate this approach against traditional string-matching and supervised methods.

Methods: We propose a simple yet effective approach that combines string matching with an LLM through in-context learning. Our method avoids fine-tuning and minimizes annotation requirements, making it suitable for low-resource settings. Our system enhances fuzzy string matching by expanding mention spans with LLM-generated synonyms during candidate generation. UMLS entity names, aliases, and synonyms are indexed in Elasticsearch, and candidates are retrieved using both the original span and generated variants. Disambiguation is performed using an LLM with few-shot prompting to select the correct entity from the candidate list.

Results: Evaluated on the MedMentions dataset, our approach achieves 56% linking accuracy, outperforming baseline string matching but falling behind supervised learning methods. The candidate generation component reaches 70% recall@5, while the disambiguation step achieves 80% accuracy when the correct entity is among the top five. We also observe that LLM-generated descriptions do not always improve accuracy.

Conclusion: Our results demonstrate that LLMs have the potential to support medical entity linking in low-resource settings. Although our method is still outperformed by supervised models, it remains a lightweight alternative, requiring no fine-tuning or a large amount of annotated data. The approach is also adaptable to other domains and ontologies beyond biomedicine due to its flexible and domain-agnostic design.

查看原文本刊更多论文

使用无微调llm的低资源环境中的医疗实体链接。

医学实体链接是生物医学自然语言处理中的一项重要任务，旨在将医学概念的文本提及与本体中的标准化概念对齐。大多数现有方法依赖于监督模型或特定领域的嵌入，这需要大量的数据集和大量的计算资源。目的：本研究的目的是：(1)研究大型语言模型（llm）在通过同义词扩展和上下文学习改善医学实体链接的候选词生成和消歧方面的有效性，以及(2)将这种方法与传统的字符串匹配和监督方法进行比较。方法：我们提出了一种简单而有效的方法，通过上下文学习将字符串匹配与LLM相结合。我们的方法避免了微调并最小化了注释需求，使其适合低资源设置。我们的系统通过在候选词生成过程中扩展llm生成的同义词的提及范围来增强模糊字符串匹配。在Elasticsearch中索引UMLS实体名称、别名和同义词，并使用原始跨度和生成的变体检索候选对象。消除歧义是使用LLM执行的，并带有少量提示，以便从候选列表中选择正确的实体。结果：在med提及数据集上进行评估，我们的方法达到56%的链接准确率，优于基线字符串匹配，但落后于监督学习方法。候选生成组件达到70% recall@5，而当正确实体位于前五名时，消歧步骤达到80%的准确率。我们还观察到llm生成的描述并不总是提高准确性。结论：我们的研究结果表明，法学硕士有潜力支持医疗实体链接在低资源设置。尽管我们的方法仍然优于监督模型，但它仍然是一种轻量级的替代方法，不需要微调或大量注释数据。该方法由于其灵活和领域不可知的设计，也适用于生物医学以外的其他领域和本体。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Studies in health technology and informatics

自引率

0.00%

发文量