Assessment and Integration of Large Language Models for Automated Electronic Health Record Documentation in Emergency Medical Services.

IF 5.7 3区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Medical Systems Pub Date : 2025-05-17 DOI:10.1007/s10916-025-02197-w

Enze Bai, Xiao Luo, Zhan Zhang, Kathleen Adelgais, Humaira Ali, Jack Finkelstein, Jared Kutzin

{"title":"Assessment and Integration of Large Language Models for Automated Electronic Health Record Documentation in Emergency Medical Services.","authors":"Enze Bai, Xiao Luo, Zhan Zhang, Kathleen Adelgais, Humaira Ali, Jack Finkelstein, Jared Kutzin","doi":"10.1007/s10916-025-02197-w","DOIUrl":null,"url":null,"abstract":"<p><p>Automating Electronic Health Records (EHR) documentation can significantly reduce the burden on care providers, particularly in emergency care settings where rapid and accurate record-keeping is crucial. A critical aspect of this automation involves using natural language processing (NLP) techniques to convert transcribed conversations into structured EHR fields. For instance, extracting temperature values like \"102.4 Fahrenheit\" from the transcribed text \"His temperature is 39.1, which is 102.4 Fahrenheit.\" However, traditional rule-based and single-model NLP approaches often struggle with domain-specific medical terminology, contextual ambiguity, and numerical extraction errors. This study investigates the potential of integrating multiple Large Language Models (LLMs) to enhance EMS documentation accuracy. We developed an LLM integration framework and evaluated four state-of-the-art LLMs-Claude 3.5, GPT-4, Gemini, and Mistral-on a dataset comprising transcribed conversations from 40 EMS training simulations. The evaluation focused on precision, recall, and F1 score across zero-shot and few-shot learning scenarios. Results showed that the integrated LLM framework outperformed individual models, achieving overall F1 scores of 0.78 (zero-shot) and 0.81 (few-shot). In addition to quantitative evaluation, a preliminary user study was conducted with domain experts to assess the perceived usefulness and challenges of the integrated framework. The findings suggest that this approach has the potential to reduce documentation effort compared to traditional manual documentation. However, challenges such as misinterpretation of medical context and occasional omissions were noted, highlighting areas for further refinement and future work. This research is the first to systematically explore and evaluate the use of LLMs for real-time EMS EHR documentation. By addressing key challenges in automated transcription and structured data extraction, our work lays a foundation for real-world implementation, improving efficiency and accuracy in emergency medical documentation.</p>","PeriodicalId":16338,"journal":{"name":"Journal of Medical Systems","volume":"49 1","pages":"65"},"PeriodicalIF":5.7000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Systems","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10916-025-02197-w","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Automating Electronic Health Records (EHR) documentation can significantly reduce the burden on care providers, particularly in emergency care settings where rapid and accurate record-keeping is crucial. A critical aspect of this automation involves using natural language processing (NLP) techniques to convert transcribed conversations into structured EHR fields. For instance, extracting temperature values like "102.4 Fahrenheit" from the transcribed text "His temperature is 39.1, which is 102.4 Fahrenheit." However, traditional rule-based and single-model NLP approaches often struggle with domain-specific medical terminology, contextual ambiguity, and numerical extraction errors. This study investigates the potential of integrating multiple Large Language Models (LLMs) to enhance EMS documentation accuracy. We developed an LLM integration framework and evaluated four state-of-the-art LLMs-Claude 3.5, GPT-4, Gemini, and Mistral-on a dataset comprising transcribed conversations from 40 EMS training simulations. The evaluation focused on precision, recall, and F1 score across zero-shot and few-shot learning scenarios. Results showed that the integrated LLM framework outperformed individual models, achieving overall F1 scores of 0.78 (zero-shot) and 0.81 (few-shot). In addition to quantitative evaluation, a preliminary user study was conducted with domain experts to assess the perceived usefulness and challenges of the integrated framework. The findings suggest that this approach has the potential to reduce documentation effort compared to traditional manual documentation. However, challenges such as misinterpretation of medical context and occasional omissions were noted, highlighting areas for further refinement and future work. This research is the first to systematically explore and evaluate the use of LLMs for real-time EMS EHR documentation. By addressing key challenges in automated transcription and structured data extraction, our work lays a foundation for real-world implementation, improving efficiency and accuracy in emergency medical documentation.

查看原文本刊更多论文

紧急医疗服务中自动化电子健康档案的大型语言模型评估与整合。

电子健康记录（EHR）文档的自动化可以显著减轻护理提供者的负担，特别是在紧急护理环境中，快速和准确的记录保存至关重要。这种自动化的一个关键方面涉及使用自然语言处理（NLP）技术将转录的对话转换为结构化的EHR字段。例如，从转录文本“他的温度是39.1，也就是102.4华氏度”中提取温度值“102.4华氏度”。然而，传统的基于规则和单模型的NLP方法经常与特定领域的医学术语、上下文歧义和数值提取错误作斗争。本研究探讨了整合多个大型语言模型（llm）以提高EMS文档准确性的潜力。我们开发了一个法学硕士集成框架，并在包含40个EMS训练模拟的转录对话的数据集上评估了四个最先进的法学硕士——claude 3.5、GPT-4、Gemini和mistral。评估的重点是在零射击和少射击的学习场景下的准确率、召回率和F1分数。结果表明，综合LLM框架优于单个模型，F1总分为0.78（零投篮）和0.81（少投篮）。除了定量评估外，还与领域专家进行了初步的用户研究，以评估集成框架的感知有用性和挑战。研究结果表明，与传统的手工文档编制相比，这种方法具有减少文档编制工作的潜力。然而，也注意到对医学背景的误解和偶尔的遗漏等挑战，强调了需要进一步改进和今后工作的领域。这项研究是第一个系统地探索和评估llm用于实时EMS EHR文档的使用。通过解决自动转录和结构化数据提取中的关键挑战，我们的工作为现实世界的实施奠定了基础，提高了紧急医疗文档的效率和准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Medical Systems 医学-卫生保健

CiteScore

11.60

自引率

1.90%

发文量

审稿时长

4.8 months

期刊介绍： Journal of Medical Systems provides a forum for the presentation and discussion of the increasingly extensive applications of new systems techniques and methods in hospital clinic and physician''s office administration; pathology radiology and pharmaceutical delivery systems; medical records storage and retrieval; and ancillary patient-support systems. The journal publishes informative articles essays and studies across the entire scale of medical systems from large hospital programs to novel small-scale medical services. Education is an integral part of this amalgamation of sciences and selected articles are published in this area. Since existing medical systems are constantly being modified to fit particular circumstances and to solve specific problems the journal includes a special section devoted to status reports on current installations.