Comparative analysis of generative LLMs for labeling entities in clinical notes.

Genomics & informatics Pub Date : 2025-02-06 DOI:10.1186/s44342-024-00036-x

Rodrigo Del Moral-González, Helena Gómez-Adorno, Orlando Ramos-Flores

引用次数: 0

Abstract

This paper evaluates and compares different fine-tuned variations of generative large language models (LLM) in the zero-shot named entity recognition (NER) task for the clinical domain. As part of the 8th Biomedical Linked Annotation Hackathon, we examined Llama 2 and Mistral models, including base versions and those that have been fine-tuned for code, chat, and instruction-following tasks. We assess both the number of correctly identified entities and the models' ability to retrieve entities in structured formats. We used a publicly available set of clinical cases labeled with mentions of diseases, symptoms, and medical procedures for the evaluation. Results show that instruction fine-tuned models perform better than chat fine-tuned and base models in recognizing entities. It is also shown that models perform better when simple output structures are requested.

查看原文本刊更多论文

临床记录中标记实体的生成式llm的比较分析。

本文评估和比较了临床领域零射击命名实体识别（NER）任务中生成大语言模型（LLM）的不同微调变化。作为第八届生物医学链接注释黑客马拉松的一部分，我们检查了Llama 2和Mistral模型，包括基本版本和那些为代码、聊天和指令遵循任务进行了微调的模型。我们评估了正确识别实体的数量和模型以结构化格式检索实体的能力。我们使用了一组公开的临床病例，标记了疾病、症状和医疗程序进行评估。结果表明，指令微调模型在实体识别方面优于聊天微调模型和基本模型。当要求简单的输出结构时，模型的性能更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Genomics & informatics

自引率

0.00%

发文量