AI-generated draft replies to patient messages: exploring effects of implementation.

IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES

Frontiers in digital health Pub Date : 2025-06-12 eCollection Date: 2025-01-01 DOI:10.3389/fdgth.2025.1588143

Charlotte M H H T Bootsma-Robroeks, Jessica D Workum, Stephanie C E Schuit, Anne Hoekman, Tarannom Mehri, Job N Doornberg, Tom P van der Laan, Rosanne C Schoonbeek

{"title":"AI-generated draft replies to patient messages: exploring effects of implementation.","authors":"Charlotte M H H T Bootsma-Robroeks, Jessica D Workum, Stephanie C E Schuit, Anne Hoekman, Tarannom Mehri, Job N Doornberg, Tom P van der Laan, Rosanne C Schoonbeek","doi":"10.3389/fdgth.2025.1588143","DOIUrl":null,"url":null,"abstract":"Introduction: The integration of Large Language Models (LLMs) in Electronic Health Records (EHRs) has the potential to reduce administrative burden. Validating these tools in real-world clinical settings is essential for responsible implementation. In this study, the effect of implementing LLM-generated draft responses to patient questions in our EHR is evaluated with regard to adoption, use and potential time savings.Material and methods: Physicians across 14 medical specialties in a non-English large academic hospital were invited to use LLM-generated draft replies during this prospective observational clinical cohort study of 16 weeks, choosing either the drafted or a blank reply. The adoption rate, the level of adjustments to the initial drafted responses compared to the final sent messages (using ROUGE-1 and BLEU-1 natural language processing scores), and the time spent on these adjustments were analyzed.Results: A total of 919 messages by 100 physicians were evaluated. Clinicians used the LLM draft in 58% of replies. Of these, 43% used a large part of the suggested text for the final answer (≥10% match drafted responses: ROUGE-1: 86% similarity, vs. blank replies: ROUGE-1: 16%). Total response time did not significantly different when using a blank reply compared to using a drafted reply with ≥10% match (157 vs. 153 s, p = 0.69).Discussion: General adoption of LLM-generated draft responses to patient messages was 58%, although the level of adjustments on the drafted message varied widely between medical specialties. This implicates safe use in a non-English, tertiary setting. The current implementation has not yet resulted in time savings, but a learning curve can be expected.Registration number: 19035.","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1588143"},"PeriodicalIF":3.2000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12198195/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdgth.2025.1588143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: The integration of Large Language Models (LLMs) in Electronic Health Records (EHRs) has the potential to reduce administrative burden. Validating these tools in real-world clinical settings is essential for responsible implementation. In this study, the effect of implementing LLM-generated draft responses to patient questions in our EHR is evaluated with regard to adoption, use and potential time savings.

Material and methods: Physicians across 14 medical specialties in a non-English large academic hospital were invited to use LLM-generated draft replies during this prospective observational clinical cohort study of 16 weeks, choosing either the drafted or a blank reply. The adoption rate, the level of adjustments to the initial drafted responses compared to the final sent messages (using ROUGE-1 and BLEU-1 natural language processing scores), and the time spent on these adjustments were analyzed.

Results: A total of 919 messages by 100 physicians were evaluated. Clinicians used the LLM draft in 58% of replies. Of these, 43% used a large part of the suggested text for the final answer (≥10% match drafted responses: ROUGE-1: 86% similarity, vs. blank replies: ROUGE-1: 16%). Total response time did not significantly different when using a blank reply compared to using a drafted reply with ≥10% match (157 vs. 153 s, p = 0.69).

Discussion: General adoption of LLM-generated draft responses to patient messages was 58%, although the level of adjustments on the drafted message varied widely between medical specialties. This implicates safe use in a non-English, tertiary setting. The current implementation has not yet resulted in time savings, but a learning curve can be expected.

Registration number: 19035.

查看原文本刊更多论文

人工智能生成的患者信息回复草案：探索实施效果。

在电子健康记录（EHRs）中集成大型语言模型（llm）有可能减轻管理负担。在真实的临床环境中验证这些工具对于负责任的实施至关重要。在本研究中，在我们的电子病历中，对采用、使用和潜在的时间节省进行了评估，评估了实施法学硕士生成的患者问题回复草案的效果。材料和方法：在这项为期16周的前瞻性观察性临床队列研究中，一家非英语大型学术医院的14个医学专业的医生被邀请使用法学硕士生成的草稿答复，选择草稿答复或空白答复。我们分析了采用率、与最终发送的消息相比，对最初起草的响应的调整程度（使用ROUGE-1和blue -1自然语言处理分数），以及在这些调整上花费的时间。结果：共评估100名医生的919条信息。临床医生在58%的回复中使用了法学硕士草案。其中，43%的人使用了建议文本的大部分作为最终答案（≥10%的人与起草的回答相匹配：ROUGE-1: 86%相似，而空白回答：ROUGE-1: 16%）。使用空白回复与使用匹配度≥10%的草稿回复相比，总反应时间无显著差异（157 vs 153 s, p = 0.69）。讨论：法学硕士生成的患者信息回复草案的普遍采用率为58%，尽管对起草的信息的调整水平因医学专业而异。这意味着在非英语的高等教育环境中安全使用。目前的实现还没有节省时间，但是可以预期会有一个学习曲线。注册号：19035。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊