Generative AI Demonstrated Difficulty Reasoning on Nursing Flowsheet Data.

AMIA ... Annual Symposium proceedings. AMIA Symposium Pub Date : 2025-05-22 eCollection Date: 2024-01-01

Courtney J Diamond, Jennifer Thate, Jennifer B Withall, Rachel Y Lee, Kenrick Cato, Sarah C Rossetti

{"title":"Generative AI Demonstrated Difficulty Reasoning on Nursing Flowsheet Data.","authors":"Courtney J Diamond, Jennifer Thate, Jennifer B Withall, Rachel Y Lee, Kenrick Cato, Sarah C Rossetti","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Excessive documentation burden is linked to clinician burnout, thus motivating efforts to reduce burden. Generative artificial intelligence (AI) poses opportunities for burden reduction but requires rigorous assessment. We evaluated the ability of a large language model (LLM) (OpenAI's GPT-4) to interpret various intervention-response relationships presented on nursing flowsheets, assessing performance using MUC-5 evaluation metrics, and compared its assessments to those of nurse expert evaluators. ChatGPT correctly assessed 3 of 14 clinical scenarios, and partially correctly assessed 6 of 14, frequently omitting data from its reasoning. Nurse expert evaluators correctly assessed all relationships and provided additional language reflective of standard nursing practice beyond the intervention-response relationships evidenced in nursing flowsheets. Future work should ensure the training data used for electronic health record (EHR)-integrated LLMs includes all types of narrative nursing documentation that reflect nurses' clinical reasoning, and verification of LLM-based information summarization does not burden end-users.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2024 ","pages":"349-358"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12099445/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AMIA ... Annual Symposium proceedings. AMIA Symposium","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Excessive documentation burden is linked to clinician burnout, thus motivating efforts to reduce burden. Generative artificial intelligence (AI) poses opportunities for burden reduction but requires rigorous assessment. We evaluated the ability of a large language model (LLM) (OpenAI's GPT-4) to interpret various intervention-response relationships presented on nursing flowsheets, assessing performance using MUC-5 evaluation metrics, and compared its assessments to those of nurse expert evaluators. ChatGPT correctly assessed 3 of 14 clinical scenarios, and partially correctly assessed 6 of 14, frequently omitting data from its reasoning. Nurse expert evaluators correctly assessed all relationships and provided additional language reflective of standard nursing practice beyond the intervention-response relationships evidenced in nursing flowsheets. Future work should ensure the training data used for electronic health record (EHR)-integrated LLMs includes all types of narrative nursing documentation that reflect nurses' clinical reasoning, and verification of LLM-based information summarization does not burden end-users.

本刊更多论文

生成式人工智能在护理流程数据上演示了困难推理。

过多的文件负担与临床医生的职业倦怠有关，因此激励努力减轻负担。生成式人工智能（AI）为减轻负担提供了机会，但需要严格的评估。我们评估了大型语言模型（LLM）（OpenAI的GPT-4）解释护理流程中呈现的各种干预-反应关系的能力，使用MUC-5评估指标评估绩效，并将其评估与护士专家评估者的评估进行了比较。ChatGPT正确评估了14个临床场景中的3个，部分正确评估了14个中的6个，经常省略其推理中的数据。护理专家评估人员正确地评估了所有关系，并提供了反映标准护理实践的额外语言，超出了护理流程中所证明的干预-反应关系。未来的工作应确保用于电子健康记录（EHR）集成法学硕士的培训数据包括反映护士临床推理的所有类型的叙述性护理文件，并且基于法学硕士的信息总结的验证不会给最终用户带来负担。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

AMIA ... Annual Symposium proceedings. AMIA Symposium

自引率

0.00%

发文量