Why do users override alerts? Utilizing large language model to summarize comments and optimize clinical decision support.

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association Pub Date : 2024-05-20 DOI:10.1093/jamia/ocae041

Siru Liu, Allison B McCoy, Aileen P Wright, Scott D Nelson, Sean S Huang, Hasan B Ahmad, Sabrina E Carro, Jacob Franklin, James Brogan, Adam Wright

{"title":"Why do users override alerts? Utilizing large language model to summarize comments and optimize clinical decision support.","authors":"Siru Liu, Allison B McCoy, Aileen P Wright, Scott D Nelson, Sean S Huang, Hasan B Ahmad, Sabrina E Carro, Jacob Franklin, James Brogan, Adam Wright","doi":"10.1093/jamia/ocae041","DOIUrl":null,"url":null,"abstract":"Objectives: To evaluate the capability of using generative artificial intelligence (AI) in summarizing alert comments and to determine if the AI-generated summary could be used to improve clinical decision support (CDS) alerts.Materials and methods: We extracted user comments to alerts generated from September 1, 2022 to September 1, 2023 at Vanderbilt University Medical Center. For a subset of 8 alerts, comment summaries were generated independently by 2 physicians and then separately by GPT-4. We surveyed 5 CDS experts to rate the human-generated and AI-generated summaries on a scale from 1 (strongly disagree) to 5 (strongly agree) for the 4 metrics: clarity, completeness, accuracy, and usefulness.Results: Five CDS experts participated in the survey. A total of 16 human-generated summaries and 8 AI-generated summaries were assessed. Among the top 8 rated summaries, five were generated by GPT-4. AI-generated summaries demonstrated high levels of clarity, accuracy, and usefulness, similar to the human-generated summaries. Moreover, AI-generated summaries exhibited significantly higher completeness and usefulness compared to the human-generated summaries (AI: 3.4 ± 1.2, human: 2.7 ± 1.2, P = .001).Conclusion: End-user comments provide clinicians' immediate feedback to CDS alerts and can serve as a direct and valuable data resource for improving CDS delivery. Traditionally, these comments may not be considered in the CDS review process due to their unstructured nature, large volume, and the presence of redundant or irrelevant content. Our study demonstrates that GPT-4 is capable of distilling these comments into summaries characterized by high clarity, accuracy, and completeness. AI-generated summaries are equivalent and potentially better than human-generated summaries. These AI-generated summaries could provide CDS experts with a novel means of reviewing user comments to rapidly optimize CDS alerts both online and offline.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1388-1396"},"PeriodicalIF":4.7000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11105133/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocae041","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: To evaluate the capability of using generative artificial intelligence (AI) in summarizing alert comments and to determine if the AI-generated summary could be used to improve clinical decision support (CDS) alerts.

Materials and methods: We extracted user comments to alerts generated from September 1, 2022 to September 1, 2023 at Vanderbilt University Medical Center. For a subset of 8 alerts, comment summaries were generated independently by 2 physicians and then separately by GPT-4. We surveyed 5 CDS experts to rate the human-generated and AI-generated summaries on a scale from 1 (strongly disagree) to 5 (strongly agree) for the 4 metrics: clarity, completeness, accuracy, and usefulness.

Results: Five CDS experts participated in the survey. A total of 16 human-generated summaries and 8 AI-generated summaries were assessed. Among the top 8 rated summaries, five were generated by GPT-4. AI-generated summaries demonstrated high levels of clarity, accuracy, and usefulness, similar to the human-generated summaries. Moreover, AI-generated summaries exhibited significantly higher completeness and usefulness compared to the human-generated summaries (AI: 3.4 ± 1.2, human: 2.7 ± 1.2, P = .001).

Conclusion: End-user comments provide clinicians' immediate feedback to CDS alerts and can serve as a direct and valuable data resource for improving CDS delivery. Traditionally, these comments may not be considered in the CDS review process due to their unstructured nature, large volume, and the presence of redundant or irrelevant content. Our study demonstrates that GPT-4 is capable of distilling these comments into summaries characterized by high clarity, accuracy, and completeness. AI-generated summaries are equivalent and potentially better than human-generated summaries. These AI-generated summaries could provide CDS experts with a novel means of reviewing user comments to rapidly optimize CDS alerts both online and offline.

查看原文本刊更多论文

为什么用户会推翻警报？利用大型语言模型总结评论并优化临床决策支持。

目的评估使用生成式人工智能（AI）总结警报评论的能力，并确定是否可以使用人工智能生成的摘要来改进临床决策支持（CDS）警报：我们提取了用户对范德比尔特大学医学中心 2022 年 9 月 1 日至 2023 年 9 月 1 日期间发出的警报的评论。对于 8 个警报子集，评论摘要由 2 名医生独立生成，然后由 GPT-4 分别生成。我们对 5 位 CDS 专家进行了调查，让他们根据清晰度、完整性、准确性和实用性这 4 个指标，对人工生成和人工智能生成的摘要进行评分，评分标准从 1 分（非常不同意）到 5 分（非常同意）不等：五位 CDS 专家参与了调查。共评估了 16 份人工生成的摘要和 8 份人工智能生成的摘要。在评分最高的 8 份摘要中，有 5 份是由 GPT-4 生成的。人工智能生成的摘要在清晰度、准确性和实用性方面都达到了很高的水平，与人工生成的摘要类似。此外，与人工智能生成的摘要相比，人工智能生成的摘要在完整性和实用性方面都有显著提高（人工智能：3.4 ± 1.2，人工智能：2.7 ± 1.2，P = .001）：最终用户评论为临床医生提供了对 CDS 警报的即时反馈，可作为改进 CDS 交付的直接而宝贵的数据资源。传统上，这些评论可能不会在 CDS 审查过程中得到考虑，因为它们具有非结构化的性质、数量庞大、存在冗余或不相关的内容。我们的研究表明，GPT-4 能够将这些意见提炼成清晰、准确和完整的摘要。人工智能生成的摘要与人类生成的摘要相当，甚至可能更好。这些人工智能生成的摘要可以为 CDS 专家提供一种新的方法来审查用户评论，从而快速优化在线和离线 CDS 警报。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the American Medical Informatics Association 医学-计算机：跨学科应用

CiteScore

14.50

自引率

7.80%

发文量

230

审稿时长

3-8 weeks

期刊介绍： JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.