Human performance evaluation of a pediatric artificial intelligence sepsis model.

IF 4.6 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association Pub Date : 2025-10-01 DOI:10.1093/jamia/ocaf106

Swaminathan Kandaswamy, Naveen Muthu, Nikolay Braykov, Rebekah Carter, Reena Blanco, Thuy Bui, Evan Orenstein, Mark Mai

{"title":"Human performance evaluation of a pediatric artificial intelligence sepsis model.","authors":"Swaminathan Kandaswamy, Naveen Muthu, Nikolay Braykov, Rebekah Carter, Reena Blanco, Thuy Bui, Evan Orenstein, Mark Mai","doi":"10.1093/jamia/ocaf106","DOIUrl":null,"url":null,"abstract":"Objective: To assess the influence of an implemented artificial intelligence model predicting pediatric sepsis (defined by IPSO-Improving Pediatric Sepsis Outcomes collaborative) in the emergency department (ED) on human performance measures.Materials and methods: Two ED sites within a large pediatric health system in the Southeastern United States between January 1, 2021 and April 1, 2024. We interviewed ED providers and nurses within 72 hours of caring for a patient identified as potentially having sepsis by the predictive model. Thematic analysis of qualitative data was combined with electronic health record queries to assess measures of human performance, including situation awareness, explainability, human-computer agreement, workload, trust, automation bias, and relationship between staff and patients.Results: We interviewed 40 clinicians. Participants found that the sepsis alert improved situation awareness, leading to changes in patient care management, resource allocation, and/or monitoring. Participants reported an average trust in the model-based alert of 3.8/5. Only 28% (555/1977) of sepsis huddles were done without alert firing, suggesting some automation bias. Treatment with antibiotics for IPSO sepsis cases was similar pre- and post-intervention without a huddle (9.3% vs 10.5%), though treatment doubled with huddle intervention (22.7%). NASA Task Load Index increased from 43 to 57 post-intervention. There was no report of adverse relationships with patients post-intervention.Discussion: Human performance appeared to be generally positive with improved situation awareness and satisfaction with the alert-driven huddle. However, there was some evidence of automation bias and a slight increase in workload with the intervention.Conclusion: This study demonstrates the feasibility of evaluating multiple dimensions of human performance using a mixed methods approach for an AI model implemented in clinical practice. Future studies should aim to reduce the measurement burden of human performance metrics associated with AI implementation in acute care settings and assess the correlation between human performance measures and clinical outcomes.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1552-1561"},"PeriodicalIF":4.6000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12451936/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocaf106","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: To assess the influence of an implemented artificial intelligence model predicting pediatric sepsis (defined by IPSO-Improving Pediatric Sepsis Outcomes collaborative) in the emergency department (ED) on human performance measures.

Materials and methods: Two ED sites within a large pediatric health system in the Southeastern United States between January 1, 2021 and April 1, 2024. We interviewed ED providers and nurses within 72 hours of caring for a patient identified as potentially having sepsis by the predictive model. Thematic analysis of qualitative data was combined with electronic health record queries to assess measures of human performance, including situation awareness, explainability, human-computer agreement, workload, trust, automation bias, and relationship between staff and patients.

Results: We interviewed 40 clinicians. Participants found that the sepsis alert improved situation awareness, leading to changes in patient care management, resource allocation, and/or monitoring. Participants reported an average trust in the model-based alert of 3.8/5. Only 28% (555/1977) of sepsis huddles were done without alert firing, suggesting some automation bias. Treatment with antibiotics for IPSO sepsis cases was similar pre- and post-intervention without a huddle (9.3% vs 10.5%), though treatment doubled with huddle intervention (22.7%). NASA Task Load Index increased from 43 to 57 post-intervention. There was no report of adverse relationships with patients post-intervention.

Discussion: Human performance appeared to be generally positive with improved situation awareness and satisfaction with the alert-driven huddle. However, there was some evidence of automation bias and a slight increase in workload with the intervention.

Conclusion: This study demonstrates the feasibility of evaluating multiple dimensions of human performance using a mixed methods approach for an AI model implemented in clinical practice. Future studies should aim to reduce the measurement burden of human performance metrics associated with AI implementation in acute care settings and assess the correlation between human performance measures and clinical outcomes.

查看原文本刊更多论文

儿童人工智能脓毒症模型的人体性能评估。

目的：评估在急诊科（ED）实施的预测儿童脓毒症的人工智能模型（由ipso - improved pediatric sepsis Outcomes collaborative定义）对人类绩效指标的影响。材料和方法：2021年1月1日至2024年4月1日期间，美国东南部一个大型儿科卫生系统内的两个ED站点。我们在72小时内采访了急诊医生和护士，这些护士通过预测模型确定了可能患有败血症的患者。定性数据的专题分析与电子健康记录查询相结合，以评估人的绩效指标，包括情况意识、可解释性、人机协议、工作量、信任、自动化偏差以及工作人员与患者之间的关系。结果：我们采访了40名临床医生。参与者发现败血症警报改善了情况意识，导致患者护理管理，资源分配和/或监测的变化。参与者报告对基于模型的警报的平均信任度为3.8/5。只有28%（555/1977）的败血症患者在没有警报触发的情况下进行，这表明存在一些自动化偏差。IPSO脓毒症病例的抗生素治疗在没有分组的干预前和干预后相似（9.3% vs 10.5%），尽管分组干预的治疗翻了一番（22.7%）。干预后，NASA任务负荷指数从43增加到57。干预后与患者没有不良关系的报道。讨论：人类的表现似乎总体上是积极的，对形势的认识和对警觉性驱动的群体的满意度都有所提高。然而，有一些证据表明自动化偏差和工作量的轻微增加与干预。结论：本研究证明了在临床实践中实施的人工智能模型中使用混合方法评估人类表现的多个维度的可行性。未来的研究应旨在减轻与急性护理环境中人工智能实施相关的人类绩效指标的测量负担，并评估人类绩效指标与临床结果之间的相关性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the American Medical Informatics Association 医学-计算机：跨学科应用

CiteScore

14.50

自引率

7.80%

发文量

230

审稿时长

3-8 weeks

期刊介绍： JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.