Evaluating the Performance of Large Language Models for Named Entity Recognition in Ophthalmology Clinical Free-Text Notes.

AMIA ... Annual Symposium proceedings. AMIA Symposium Pub Date : 2025-05-22 eCollection Date: 2024-01-01

Iyad Majid, Vaibhav Mishra, Rohith Ravindranath, Sophia Y Wang

引用次数: 0

Abstract

This study compared large language models (LLMs) and Bidirectional Encoder Representations from Transformers (BERT) models in identifying medication names, routes, and frequencies from publicly available free-text ophthalmology progress notes of 480 patients. 5,520 lines of annotated text were divided into train (N=3,864), validation (N=1,104), and test sets (N=552). We evaluated ChatGPT-3.5, ChatGPT-4, PaLM 2, and Gemini to identify these medication entities. We fine-tuned BERT, BioBERT, ClinicalBERT, DistilBERT, and RoBERTa for the same task using the training set. On the test set, GPT-4 achieved the best performance (macro-averaged F1 0.962). Among the BERT models, BioBERT achieved the best performance (macro-averaged F1 0.875). Modern LLMs outperformed BERT models even in the highly domain-specific task of identifying ophthalmic medication information from progress notes, showcasing the potential of LLMs for medical named entity recognition to enhance patient care.

本刊更多论文

评估大型语言模型在眼科临床自由文本笔记中命名实体识别的性能。

本研究比较了大型语言模型（llm）和变形金刚的双向编码器表示（BERT）模型在从480名患者的公开自由文本眼科进展记录中识别药物名称、路线和频率方面的效果。5520行注释文本被分为训练集（N= 3864）、验证集（N= 1104）和测试集（N=552）。我们评估了ChatGPT-3.5、ChatGPT-4、PaLM 2和Gemini来识别这些药物实体。我们使用训练集对BERT、BioBERT、ClinicalBERT、DistilBERT和RoBERTa进行了微调，以完成相同的任务。在测试集上，GPT-4的性能最佳（宏观平均F1为0.962）。在BERT模型中，BioBERT模型表现最佳（宏观平均F1为0.875）。现代llm甚至在从进度记录中识别眼科药物信息的高度特定领域任务中也优于BERT模型，这显示了llm在医学命名实体识别方面的潜力，以增强患者护理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

AMIA ... Annual Symposium proceedings. AMIA Symposium

自引率

0.00%

发文量