A proof-of-concept study for patient use of open notes with large language models.

IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES

JAMIA Open Pub Date : 2025-04-09 eCollection Date: 2025-04-01 DOI:10.1093/jamiaopen/ooaf021

Liz Salmi, Dana M Lewis, Jennifer L Clarke, Zhiyong Dong, Rudy Fischmann, Emily I McIntosh, Chethan R Sarabu, Catherine M DesRoches

{"title":"A proof-of-concept study for patient use of open notes with large language models.","authors":"Liz Salmi, Dana M Lewis, Jennifer L Clarke, Zhiyong Dong, Rudy Fischmann, Emily I McIntosh, Chethan R Sarabu, Catherine M DesRoches","doi":"10.1093/jamiaopen/ooaf021","DOIUrl":null,"url":null,"abstract":"Objectives: The use of large language models (LLMs) is growing for both clinicians and patients. While researchers and clinicians have explored LLMs to manage patient portal messages and reduce burnout, there is less documentation about how patients use these tools to understand clinical notes and inform decision-making. This proof-of-concept study examined the reliability and accuracy of LLMs in responding to patient queries based on an open visit note.Materials and methods: In a cross-sectional proof-of-concept study, 3 commercially available LLMs (ChatGPT 4o, Claude 3 Opus, Gemini 1.5) were evaluated using 4 distinct prompt series-Standard, Randomized, Persona, and Randomized Persona-with multiple questions, designed by patients, in response to a single neuro-oncology progress note. LLM responses were scored by the note author (neuro-oncologist) and a patient who receives care from the note author, using an 8-criterion rubric that assessed Accuracy, Relevance, Clarity, Actionability, Empathy/Tone, Completeness, Evidence, and Consistency. Descriptive statistics were used to summarize the performance of each LLM across all prompts.Results: Overall, the Standard and Persona-based prompt series yielded the best results across all criterion regardless of LLM. Chat-GPT 4o using Persona-based prompts scored highest in all categories. All LLMs scored low in the use of Evidence.Discussion: This proof-of-concept study highlighted the potential for LLMs to assist patients in interpreting open notes. The most effective LLM responses were achieved by applying Persona-style prompts to a patient's question.Conclusion: Optimizing LLMs for patient-driven queries, and patient education and counseling around the use of LLMs, have potential to enhance patient use and understanding of their health information.","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 2","pages":"ooaf021"},"PeriodicalIF":3.4000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11980777/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooaf021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: The use of large language models (LLMs) is growing for both clinicians and patients. While researchers and clinicians have explored LLMs to manage patient portal messages and reduce burnout, there is less documentation about how patients use these tools to understand clinical notes and inform decision-making. This proof-of-concept study examined the reliability and accuracy of LLMs in responding to patient queries based on an open visit note.

Materials and methods: In a cross-sectional proof-of-concept study, 3 commercially available LLMs (ChatGPT 4o, Claude 3 Opus, Gemini 1.5) were evaluated using 4 distinct prompt series-Standard, Randomized, Persona, and Randomized Persona-with multiple questions, designed by patients, in response to a single neuro-oncology progress note. LLM responses were scored by the note author (neuro-oncologist) and a patient who receives care from the note author, using an 8-criterion rubric that assessed Accuracy, Relevance, Clarity, Actionability, Empathy/Tone, Completeness, Evidence, and Consistency. Descriptive statistics were used to summarize the performance of each LLM across all prompts.

Results: Overall, the Standard and Persona-based prompt series yielded the best results across all criterion regardless of LLM. Chat-GPT 4o using Persona-based prompts scored highest in all categories. All LLMs scored low in the use of Evidence.

Discussion: This proof-of-concept study highlighted the potential for LLMs to assist patients in interpreting open notes. The most effective LLM responses were achieved by applying Persona-style prompts to a patient's question.

Conclusion: Optimizing LLMs for patient-driven queries, and patient education and counseling around the use of LLMs, have potential to enhance patient use and understanding of their health information.

Abstract Image

查看原文本刊更多论文

一项关于病人使用大型语言模型的开放笔记的概念验证研究。

目的：临床医生和患者越来越多地使用大型语言模型（llm）。虽然研究人员和临床医生已经探索了法学硕士来管理患者门户信息和减少倦怠，但关于患者如何使用这些工具来理解临床记录并为决策提供信息的文献较少。这项概念验证研究检查了法学硕士在回应基于公开访问记录的患者查询时的可靠性和准确性。材料和方法：在一项横断面概念验证研究中，使用4个不同的提示系列（标准、随机、角色和随机角色）对3个商业llm （ChatGPT 40、Claude 3 Opus、Gemini 1.5）进行评估，并对患者设计的多个问题进行评估，以回应单一的神经肿瘤学进展记录。LLM的回答由笔记作者（神经肿瘤学家）和接受笔记作者治疗的患者评分，使用8个标准来评估准确性、相关性、清晰度、可操作性、移情/语气、完整性、证据性和一致性。描述性统计用于总结所有提示中每个LLM的性能。结果：总体而言，标准和基于角色的提示系列在所有标准中都产生了最好的结果，而不考虑LLM。使用基于角色提示的Chat-GPT 40在所有类别中得分最高。所有法学硕士在证据使用方面得分都很低。讨论：这项概念验证研究强调了llm帮助患者解读开放笔记的潜力。最有效的LLM回应是通过对患者的问题应用Persona-style提示来实现的。结论：为患者驱动的查询优化法学硕士，并围绕法学硕士的使用进行患者教育和咨询，有可能提高患者对其健康信息的使用和理解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊