Enze Bai, Xiao Luo, Zhan Zhang, Kathleen Adelgais, Humaira Ali, Jack Finkelstein, Jared Kutzin
{"title":"Assessment and Integration of Large Language Models for Automated Electronic Health Record Documentation in Emergency Medical Services.","authors":"Enze Bai, Xiao Luo, Zhan Zhang, Kathleen Adelgais, Humaira Ali, Jack Finkelstein, Jared Kutzin","doi":"10.1007/s10916-025-02197-w","DOIUrl":null,"url":null,"abstract":"<p><p>Automating Electronic Health Records (EHR) documentation can significantly reduce the burden on care providers, particularly in emergency care settings where rapid and accurate record-keeping is crucial. A critical aspect of this automation involves using natural language processing (NLP) techniques to convert transcribed conversations into structured EHR fields. For instance, extracting temperature values like \"102.4 Fahrenheit\" from the transcribed text \"His temperature is 39.1, which is 102.4 Fahrenheit.\" However, traditional rule-based and single-model NLP approaches often struggle with domain-specific medical terminology, contextual ambiguity, and numerical extraction errors. This study investigates the potential of integrating multiple Large Language Models (LLMs) to enhance EMS documentation accuracy. We developed an LLM integration framework and evaluated four state-of-the-art LLMs-Claude 3.5, GPT-4, Gemini, and Mistral-on a dataset comprising transcribed conversations from 40 EMS training simulations. The evaluation focused on precision, recall, and F1 score across zero-shot and few-shot learning scenarios. Results showed that the integrated LLM framework outperformed individual models, achieving overall F1 scores of 0.78 (zero-shot) and 0.81 (few-shot). In addition to quantitative evaluation, a preliminary user study was conducted with domain experts to assess the perceived usefulness and challenges of the integrated framework. The findings suggest that this approach has the potential to reduce documentation effort compared to traditional manual documentation. However, challenges such as misinterpretation of medical context and occasional omissions were noted, highlighting areas for further refinement and future work. This research is the first to systematically explore and evaluate the use of LLMs for real-time EMS EHR documentation. By addressing key challenges in automated transcription and structured data extraction, our work lays a foundation for real-world implementation, improving efficiency and accuracy in emergency medical documentation.</p>","PeriodicalId":16338,"journal":{"name":"Journal of Medical Systems","volume":"49 1","pages":"65"},"PeriodicalIF":3.5000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Systems","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10916-025-02197-w","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Automating Electronic Health Records (EHR) documentation can significantly reduce the burden on care providers, particularly in emergency care settings where rapid and accurate record-keeping is crucial. A critical aspect of this automation involves using natural language processing (NLP) techniques to convert transcribed conversations into structured EHR fields. For instance, extracting temperature values like "102.4 Fahrenheit" from the transcribed text "His temperature is 39.1, which is 102.4 Fahrenheit." However, traditional rule-based and single-model NLP approaches often struggle with domain-specific medical terminology, contextual ambiguity, and numerical extraction errors. This study investigates the potential of integrating multiple Large Language Models (LLMs) to enhance EMS documentation accuracy. We developed an LLM integration framework and evaluated four state-of-the-art LLMs-Claude 3.5, GPT-4, Gemini, and Mistral-on a dataset comprising transcribed conversations from 40 EMS training simulations. The evaluation focused on precision, recall, and F1 score across zero-shot and few-shot learning scenarios. Results showed that the integrated LLM framework outperformed individual models, achieving overall F1 scores of 0.78 (zero-shot) and 0.81 (few-shot). In addition to quantitative evaluation, a preliminary user study was conducted with domain experts to assess the perceived usefulness and challenges of the integrated framework. The findings suggest that this approach has the potential to reduce documentation effort compared to traditional manual documentation. However, challenges such as misinterpretation of medical context and occasional omissions were noted, highlighting areas for further refinement and future work. This research is the first to systematically explore and evaluate the use of LLMs for real-time EMS EHR documentation. By addressing key challenges in automated transcription and structured data extraction, our work lays a foundation for real-world implementation, improving efficiency and accuracy in emergency medical documentation.
期刊介绍:
Journal of Medical Systems provides a forum for the presentation and discussion of the increasingly extensive applications of new systems techniques and methods in hospital clinic and physician''s office administration; pathology radiology and pharmaceutical delivery systems; medical records storage and retrieval; and ancillary patient-support systems. The journal publishes informative articles essays and studies across the entire scale of medical systems from large hospital programs to novel small-scale medical services. Education is an integral part of this amalgamation of sciences and selected articles are published in this area. Since existing medical systems are constantly being modified to fit particular circumstances and to solve specific problems the journal includes a special section devoted to status reports on current installations.