Charlotte M H H T Bootsma-Robroeks, Jessica D Workum, Stephanie C E Schuit, Anne Hoekman, Tarannom Mehri, Job N Doornberg, Tom P van der Laan, Rosanne C Schoonbeek
{"title":"AI-generated draft replies to patient messages: exploring effects of implementation.","authors":"Charlotte M H H T Bootsma-Robroeks, Jessica D Workum, Stephanie C E Schuit, Anne Hoekman, Tarannom Mehri, Job N Doornberg, Tom P van der Laan, Rosanne C Schoonbeek","doi":"10.3389/fdgth.2025.1588143","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>The integration of Large Language Models (LLMs) in Electronic Health Records (EHRs) has the potential to reduce administrative burden. Validating these tools in real-world clinical settings is essential for responsible implementation. In this study, the effect of implementing LLM-generated draft responses to patient questions in our EHR is evaluated with regard to adoption, use and potential time savings.</p><p><strong>Material and methods: </strong>Physicians across 14 medical specialties in a non-English large academic hospital were invited to use LLM-generated draft replies during this prospective observational clinical cohort study of 16 weeks, choosing either the drafted or a blank reply. The adoption rate, the level of adjustments to the initial drafted responses compared to the final sent messages (using ROUGE-1 and BLEU-1 natural language processing scores), and the time spent on these adjustments were analyzed.</p><p><strong>Results: </strong>A total of 919 messages by 100 physicians were evaluated. Clinicians used the LLM draft in 58% of replies. Of these, 43% used a large part of the suggested text for the final answer (≥10% match drafted responses: ROUGE-1: 86% similarity, vs. blank replies: ROUGE-1: 16%). Total response time did not significantly different when using a blank reply compared to using a drafted reply with ≥10% match (157 vs. 153 s, <i>p</i> = 0.69).</p><p><strong>Discussion: </strong>General adoption of LLM-generated draft responses to patient messages was 58%, although the level of adjustments on the drafted message varied widely between medical specialties. This implicates safe use in a non-English, tertiary setting. The current implementation has not yet resulted in time savings, but a learning curve can be expected.</p><p><strong>Registration number: </strong>19035.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1588143"},"PeriodicalIF":3.2000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12198195/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdgth.2025.1588143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: The integration of Large Language Models (LLMs) in Electronic Health Records (EHRs) has the potential to reduce administrative burden. Validating these tools in real-world clinical settings is essential for responsible implementation. In this study, the effect of implementing LLM-generated draft responses to patient questions in our EHR is evaluated with regard to adoption, use and potential time savings.
Material and methods: Physicians across 14 medical specialties in a non-English large academic hospital were invited to use LLM-generated draft replies during this prospective observational clinical cohort study of 16 weeks, choosing either the drafted or a blank reply. The adoption rate, the level of adjustments to the initial drafted responses compared to the final sent messages (using ROUGE-1 and BLEU-1 natural language processing scores), and the time spent on these adjustments were analyzed.
Results: A total of 919 messages by 100 physicians were evaluated. Clinicians used the LLM draft in 58% of replies. Of these, 43% used a large part of the suggested text for the final answer (≥10% match drafted responses: ROUGE-1: 86% similarity, vs. blank replies: ROUGE-1: 16%). Total response time did not significantly different when using a blank reply compared to using a drafted reply with ≥10% match (157 vs. 153 s, p = 0.69).
Discussion: General adoption of LLM-generated draft responses to patient messages was 58%, although the level of adjustments on the drafted message varied widely between medical specialties. This implicates safe use in a non-English, tertiary setting. The current implementation has not yet resulted in time savings, but a learning curve can be expected.
在电子健康记录(EHRs)中集成大型语言模型(llm)有可能减轻管理负担。在真实的临床环境中验证这些工具对于负责任的实施至关重要。在本研究中,在我们的电子病历中,对采用、使用和潜在的时间节省进行了评估,评估了实施法学硕士生成的患者问题回复草案的效果。材料和方法:在这项为期16周的前瞻性观察性临床队列研究中,一家非英语大型学术医院的14个医学专业的医生被邀请使用法学硕士生成的草稿答复,选择草稿答复或空白答复。我们分析了采用率、与最终发送的消息相比,对最初起草的响应的调整程度(使用ROUGE-1和blue -1自然语言处理分数),以及在这些调整上花费的时间。结果:共评估100名医生的919条信息。临床医生在58%的回复中使用了法学硕士草案。其中,43%的人使用了建议文本的大部分作为最终答案(≥10%的人与起草的回答相匹配:ROUGE-1: 86%相似,而空白回答:ROUGE-1: 16%)。使用空白回复与使用匹配度≥10%的草稿回复相比,总反应时间无显著差异(157 vs 153 s, p = 0.69)。讨论:法学硕士生成的患者信息回复草案的普遍采用率为58%,尽管对起草的信息的调整水平因医学专业而异。这意味着在非英语的高等教育环境中安全使用。目前的实现还没有节省时间,但是可以预期会有一个学习曲线。注册号:19035。