Evaluation of a large language model to simplify discharge summaries and provide cardiological lifestyle recommendations.

IF 5.4 Q1 MEDICINE, RESEARCH & EXPERIMENTAL

Communications medicine Pub Date : 2025-05-29 DOI:10.1038/s43856-025-00927-2

Paul Rust, Julian Frings, Sven Meister, Leonard Fehring

{"title":"Evaluation of a large language model to simplify discharge summaries and provide cardiological lifestyle recommendations.","authors":"Paul Rust, Julian Frings, Sven Meister, Leonard Fehring","doi":"10.1038/s43856-025-00927-2","DOIUrl":null,"url":null,"abstract":"Background: Hospital discharge summaries are essential for the continuity of care. However, medical jargon, abbreviations, and technical language often make them too complex for patients to understand, and they frequently omit lifestyle recommendations important for self-management. This study explored using a large language model (LLM) to enhance discharge summary readability and augment it with lifestyle recommendations.Methods: We collected 20 anonymized cardiology discharge summaries. GPT-4o was prompted using full-text and segment-wise approaches to simplify each summary and generate lifestyle recommendations. Readability was measured via three standardized metrics (modified Flesch-Reading-Ease, Vienna Non-fiction Text Formula, Lesbarkeitsindex), and multiple quality dimensions were evaluated by 12 medical experts.Results: LLM-generated summaries from both prompting approaches are significantly more readable compared to the original summaries across all metrics (p < 0.0001). Based on 60 expert ratings for the full-text approach and 60 for the segment-wise approach, experts '(strongly) agree' that LLM-summaries are correct (full-text: 85%; segment-wise: 80%), complete (78%; 92%), harmless (83%; 88%), and comprehensible for patients (88%; 97%). Experts '(strongly) agree' that LLM-generated recommendations are relevant in 92%, evidence-based in 88%, personalized in 70%, complete in 88%, consistent in 93%, and harmless in 88% of 60 ratings.Conclusions: LLM-generated summaries achieve a 10th-grade readability level and high-quality ratings. While LLM-generated lifestyle recommendations are generally of high quality, personalization is limited. These findings suggest that LLMs could help create more patient-centric discharge summaries. Further research is needed to confirm clinical utility and address quality assurance, regulatory compliance, and clinical integration challenges.","PeriodicalId":72646,"journal":{"name":"Communications medicine","volume":"5 1","pages":"208"},"PeriodicalIF":5.4000,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s43856-025-00927-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Hospital discharge summaries are essential for the continuity of care. However, medical jargon, abbreviations, and technical language often make them too complex for patients to understand, and they frequently omit lifestyle recommendations important for self-management. This study explored using a large language model (LLM) to enhance discharge summary readability and augment it with lifestyle recommendations.

Methods: We collected 20 anonymized cardiology discharge summaries. GPT-4o was prompted using full-text and segment-wise approaches to simplify each summary and generate lifestyle recommendations. Readability was measured via three standardized metrics (modified Flesch-Reading-Ease, Vienna Non-fiction Text Formula, Lesbarkeitsindex), and multiple quality dimensions were evaluated by 12 medical experts.

Results: LLM-generated summaries from both prompting approaches are significantly more readable compared to the original summaries across all metrics (p < 0.0001). Based on 60 expert ratings for the full-text approach and 60 for the segment-wise approach, experts '(strongly) agree' that LLM-summaries are correct (full-text: 85%; segment-wise: 80%), complete (78%; 92%), harmless (83%; 88%), and comprehensible for patients (88%; 97%). Experts '(strongly) agree' that LLM-generated recommendations are relevant in 92%, evidence-based in 88%, personalized in 70%, complete in 88%, consistent in 93%, and harmless in 88% of 60 ratings.

Conclusions: LLM-generated summaries achieve a 10th-grade readability level and high-quality ratings. While LLM-generated lifestyle recommendations are generally of high quality, personalization is limited. These findings suggest that LLMs could help create more patient-centric discharge summaries. Further research is needed to confirm clinical utility and address quality assurance, regulatory compliance, and clinical integration challenges.

查看原文本刊更多论文

评估大型语言模型以简化出院总结并提供心脏病学生活方式建议。

背景：出院摘要对护理的连续性至关重要。然而，医学术语、缩略语和技术语言往往使它们过于复杂，患者无法理解，而且它们经常忽略对自我管理重要的生活方式建议。本研究探索使用大型语言模型（LLM）来提高出院摘要的可读性，并增加生活方式建议。方法：收集20例匿名心脏科出院总结。gpt - 40使用全文和分段方法来简化每个摘要并生成生活方式建议。可读性通过三个标准化指标（修改后的Flesch-Reading-Ease， Vienna Non-fiction Text Formula, Lesbarkeitsindex）进行测量，并由12位医学专家对多个质量维度进行评估。结果：与所有度量标准的原始摘要相比，两种提示方法生成的llm生成的摘要的可读性明显更高(p)。结论：llm生成的摘要达到了10级的可读性水平和高质量评级。虽然llm生成的生活方式建议通常是高质量的，但个性化是有限的。这些发现表明llm可以帮助创建更多以患者为中心的出院摘要。需要进一步的研究来确认临床应用，并解决质量保证、法规遵从性和临床整合方面的挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Communications medicine

自引率

0.00%

发文量