Large language model triaging of simulated nephrology patient inbox messages.

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Artificial Intelligence Pub Date : 2024-09-09 eCollection Date: 2024-01-01 DOI:10.3389/frai.2024.1452469

Justin H Pham, Charat Thongprayoon, Jing Miao, Supawadee Suppadungsuk, Priscilla Koirala, Iasmina M Craici, Wisit Cheungpasitporn

{"title":"Large language model triaging of simulated nephrology patient inbox messages.","authors":"Justin H Pham, Charat Thongprayoon, Jing Miao, Supawadee Suppadungsuk, Priscilla Koirala, Iasmina M Craici, Wisit Cheungpasitporn","doi":"10.3389/frai.2024.1452469","DOIUrl":null,"url":null,"abstract":"Background: Efficient triage of patient communications is crucial for timely medical attention and improved care. This study evaluates ChatGPT's accuracy in categorizing nephrology patient inbox messages, assessing its potential in outpatient settings.Methods: One hundred and fifty simulated patient inbox messages were created based on cases typically encountered in everyday practice at a nephrology outpatient clinic. These messages were triaged as non-urgent, urgent, and emergent by two nephrologists. The messages were then submitted to ChatGPT-4 for independent triage into the same categories. The inquiry process was performed twice with a two-week period in between. ChatGPT responses were graded as correct (agreement with physicians), overestimation (higher priority), or underestimation (lower priority).Results: In the first trial, ChatGPT correctly triaged 140 (93%) messages, overestimated the priority of 4 messages (3%), and underestimated the priority of 6 messages (4%). In the second trial, it correctly triaged 140 (93%) messages, overestimated the priority of 9 (6%), and underestimated the priority of 1 (1%). The accuracy did not depend on the urgency level of the message (p = 0.19). The internal agreement of ChatGPT responses was 92% with an intra-rater Kappa score of 0.88.Conclusion: ChatGPT-4 demonstrated high accuracy in triaging nephrology patient messages, highlighting the potential for AI-driven triage systems to enhance operational efficiency and improve patient care in outpatient clinics.","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1452469"},"PeriodicalIF":3.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11417033/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1452469","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Efficient triage of patient communications is crucial for timely medical attention and improved care. This study evaluates ChatGPT's accuracy in categorizing nephrology patient inbox messages, assessing its potential in outpatient settings.

Methods: One hundred and fifty simulated patient inbox messages were created based on cases typically encountered in everyday practice at a nephrology outpatient clinic. These messages were triaged as non-urgent, urgent, and emergent by two nephrologists. The messages were then submitted to ChatGPT-4 for independent triage into the same categories. The inquiry process was performed twice with a two-week period in between. ChatGPT responses were graded as correct (agreement with physicians), overestimation (higher priority), or underestimation (lower priority).

Results: In the first trial, ChatGPT correctly triaged 140 (93%) messages, overestimated the priority of 4 messages (3%), and underestimated the priority of 6 messages (4%). In the second trial, it correctly triaged 140 (93%) messages, overestimated the priority of 9 (6%), and underestimated the priority of 1 (1%). The accuracy did not depend on the urgency level of the message (p = 0.19). The internal agreement of ChatGPT responses was 92% with an intra-rater Kappa score of 0.88.

Conclusion: ChatGPT-4 demonstrated high accuracy in triaging nephrology patient messages, highlighting the potential for AI-driven triage systems to enhance operational efficiency and improve patient care in outpatient clinics.

查看原文本刊更多论文

模拟肾科病人收件箱信息的大语言模型分流。

背景：高效的患者通信分流对于及时就医和改善护理至关重要。本研究评估了 ChatGPT 对肾科病人收件箱信息进行分类的准确性，并评估了其在门诊环境中的应用潜力：方法：根据肾脏科门诊日常工作中遇到的典型病例，创建了 150 条模拟患者收件箱信息。这些信息由两名肾病专家分流为非紧急、紧急和紧急。然后，这些信息被提交给 ChatGPT-4，由其按照相同的类别进行独立分流。查询过程进行两次，中间间隔两周。ChatGPT 的回复分为正确（与医生意见一致）、高估（优先级较高）或低估（优先级较低）：在第一次试验中，ChatGPT 正确分流了 140 条信息（93%），高估了 4 条信息的优先级（3%），低估了 6 条信息的优先级（4%）。在第二次试验中，它正确分流了 140 封邮件（93%），高估了 9 封邮件（6%）的优先级，低估了 1 封邮件（1%）的优先级。准确率与信息的紧急程度无关（p = 0.19）。ChatGPT 回答的内部一致性为 92%，评分者内部 Kappa 得分为 0.88：ChatGPT-4在分流肾病患者信息方面表现出了很高的准确性，突显了人工智能驱动的分流系统在提高门诊操作效率和改善患者护理方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊