Quality assessment of artificial intelligence-generated versus human-written hospital summaries evaluating detail, usefulness, and continuity of care

IF 2.3 4区医学 Q1 MEDICINE, GENERAL & INTERNAL

Journal of hospital medicine Pub Date : 2026-04-09 Epub Date: 2025-09-30 DOI:10.1002/jhm.70163

Douglas Challener MD, MS, Shant Ayanian MD, Alexander Ryu MD, John O'Horo MD, MPH, Heather Heaton MD, MS

{"title":"Quality assessment of artificial intelligence-generated versus human-written hospital summaries evaluating detail, usefulness, and continuity of care","authors":"Douglas Challener MD, MS, Shant Ayanian MD, Alexander Ryu MD, John O'Horo MD, MPH, Heather Heaton MD, MS","doi":"10.1002/jhm.70163","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Hospital discharge summaries are critical for ensuring continuity of care, but their quality often varies. Large language models (LLMs) have the potential to standardize and enhance the efficiency of this documentation process.</p>\n </section>\n \n <section>\n \n <h3> Objectives</h3>\n \n <p>To evaluate the quality of hospital discharge summaries created by an LLM-based hospital course drafting tool created by Epic Systems compared with human-written summaries.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Retrospective study at a single tertiary-care institution in 2024. The cohort included 100 adult hospitalizations lasting >72 h across medical and surgical dismissing services. No interventions were performed. Summaries (LLM-generated vs. human-written) were independently reviewed using a standardized rubric covering nine domains (e.g., comprehensiveness, clarity, relevance). Scores were normalized and compared. Readability was assessed using Flesch Reading Ease.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>LLM-generated summaries outperformed human-written summaries across all criteria (<i>p</i> < .05), with the greatest difference observed in comprehensiveness (LLM median 0.62 vs. human −0.23). Human-written summaries from surgical services scored lower than those from medical services, but LLM performance was consistent across both. Human summaries had higher Flesch Reading Ease scores (33.11 vs. 26.2; <i>p</i> < .05), reflecting simpler language.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>LLM-generated summaries demonstrated superior quality, consistency, and clinical utility compared with human-written summaries, highlighting their potential to improve documentation efficiency and standardization.</p>\n </section>\n </div>","PeriodicalId":15883,"journal":{"name":"Journal of hospital medicine","volume":"21 4","pages":"375-379"},"PeriodicalIF":2.3000,"publicationDate":"2026-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of hospital medicine","FirstCategoryId":"3","ListUrlMain":"https://shmpublications.onlinelibrary.wiley.com/doi/10.1002/jhm.70163","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/30 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Hospital discharge summaries are critical for ensuring continuity of care, but their quality often varies. Large language models (LLMs) have the potential to standardize and enhance the efficiency of this documentation process.

Objectives

To evaluate the quality of hospital discharge summaries created by an LLM-based hospital course drafting tool created by Epic Systems compared with human-written summaries.

Methods

Retrospective study at a single tertiary-care institution in 2024. The cohort included 100 adult hospitalizations lasting >72 h across medical and surgical dismissing services. No interventions were performed. Summaries (LLM-generated vs. human-written) were independently reviewed using a standardized rubric covering nine domains (e.g., comprehensiveness, clarity, relevance). Scores were normalized and compared. Readability was assessed using Flesch Reading Ease.

Results

LLM-generated summaries outperformed human-written summaries across all criteria (p < .05), with the greatest difference observed in comprehensiveness (LLM median 0.62 vs. human −0.23). Human-written summaries from surgical services scored lower than those from medical services, but LLM performance was consistent across both. Human summaries had higher Flesch Reading Ease scores (33.11 vs. 26.2; p < .05), reflecting simpler language.

Conclusions

LLM-generated summaries demonstrated superior quality, consistency, and clinical utility compared with human-written summaries, highlighting their potential to improve documentation efficiency and standardization.

Abstract Image

查看原文本刊更多论文

人工智能生成的与人类撰写的医院摘要的质量评估，评估细节、有用性和护理的连续性。

背景：出院摘要对确保护理的连续性至关重要，但其质量往往参差不齐。大型语言模型（llm）具有标准化和提高文档流程效率的潜力。目的：评估由Epic Systems创建的基于法学硕士的医院课程起草工具创建的出院摘要的质量，并与人工撰写的摘要进行比较。方法：于2024年在一家三级医疗机构进行回顾性研究。该队列包括100名在医疗和外科出院服务中住院的成年人，持续时间为bbb72小时。未进行干预。摘要（法学硕士生成的与人工编写的）使用涵盖九个领域（例如，全面性，清晰度，相关性）的标准化标题进行独立审查。将得分归一化并进行比较。使用Flesch Reading Ease评估可读性。结果：llm生成的摘要在所有标准上都优于人类编写的摘要(p)。结论：与人类编写的摘要相比，llm生成的摘要表现出更高的质量、一致性和临床实用性，突出了它们在提高文档效率和标准化方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of hospital medicine 医学-医学：内科

CiteScore

4.40

自引率

11.50%

发文量

233

审稿时长

4-8 weeks

期刊介绍： JHM is a peer-reviewed publication of the Society of Hospital Medicine and is published 12 times per year. JHM publishes manuscripts that address the care of hospitalized adults or children. Broad areas of interest include (1) Treatments for common inpatient conditions; (2) Approaches to improving perioperative care; (3) Improving care for hospitalized patients with geriatric or pediatric vulnerabilities (such as mobility problems, or those with complex longitudinal care); (4) Evaluation of innovative healthcare delivery or educational models; (5) Approaches to improving the quality, safety, and value of healthcare across the acute- and postacute-continuum of care; and (6) Evaluation of policy and payment changes that affect hospital and postacute care.