{"title":"再生人工智能时代的护理判断:急诊护士临床决策绩效的跨国研究","authors":"C. Levin , A. Zaboli , G. Turcato , M. Saban","doi":"10.1016/j.ijnurstu.2025.105216","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Clinical decision-making is a core competency in emergency nursing, requiring rapid and accurate assessments. With the growing integration of Generative Artificial Intelligence in healthcare, there is a pressing need to understand its potential as a clinical decision support tool. While Generative Artificial Intelligence models show high accuracy and consistency, their ability to navigate complex, context-sensitive scenarios remains in question.</div></div><div><h3>Objectives</h3><div>This study aimed to compare the clinical decision-making performance of emergency nurses from Israel and Italy with Generative Artificial Intelligence models (Claude-3.5, ChatGPT-4.0, and Gemini-1.5). It evaluated differences in severity assessment, hospitalization decisions, and test selection, while exploring the influence of demographic and professional characteristics on decision accuracy.</div></div><div><h3>Methods</h3><div>A prospective observational study was conducted among 82 emergency nurses (49 from Italy, 33 from Israel), each independently evaluating five standardized clinical cases. Their decisions were compared with those generated by Generative Artificial Intelligence models using a structured evaluation rubric. Statistical analyses included ANOVA, chi-square tests, logistic regression, and receiver operating characteristic curve analysis to assess predictive accuracy.</div></div><div><h3>Results</h3><div>Generative Artificial Intelligence models exhibited higher overall decision accuracy and stronger alignment with expert recommendations. However, notable discrepancies emerged in hospitalization decisions and severity assessments. For example, in Case 2, Generative Artificial Intelligence rated severity as level 1, while Italian and Israeli nurses rated it at 1.98 and 2.23, respectively (P < 0.01, F = 199). In Case 1, only 4.1 % of Italian nurses recommended hospitalization compared to 30.3 % of Israeli nurses, whereas all Generative Artificial Intelligence models advised hospitalization. Nurses showed greater variability in test selection and severity judgments, reflecting their use of clinical intuition and contextual reasoning. Demographics such as age, gender, and years of experience did not significantly predict accuracy.</div></div><div><h3>Conclusions</h3><div>Generative Artificial Intelligence models demonstrated consistency and expert alignment but lacked the contextual sensitivity vital in emergency care. These results highlight the potential of Generative Artificial Intelligence as a clinical decision-support tool while emphasizing the continued importance of human clinical judgment.</div></div>","PeriodicalId":50299,"journal":{"name":"International Journal of Nursing Studies","volume":"172 ","pages":"Article 105216"},"PeriodicalIF":7.1000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Nursing judgment in the age of generative artificial intelligence: A cross-national study on clinical decision-making performance among emergency nurses\",\"authors\":\"C. Levin , A. Zaboli , G. Turcato , M. Saban\",\"doi\":\"10.1016/j.ijnurstu.2025.105216\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Clinical decision-making is a core competency in emergency nursing, requiring rapid and accurate assessments. With the growing integration of Generative Artificial Intelligence in healthcare, there is a pressing need to understand its potential as a clinical decision support tool. While Generative Artificial Intelligence models show high accuracy and consistency, their ability to navigate complex, context-sensitive scenarios remains in question.</div></div><div><h3>Objectives</h3><div>This study aimed to compare the clinical decision-making performance of emergency nurses from Israel and Italy with Generative Artificial Intelligence models (Claude-3.5, ChatGPT-4.0, and Gemini-1.5). It evaluated differences in severity assessment, hospitalization decisions, and test selection, while exploring the influence of demographic and professional characteristics on decision accuracy.</div></div><div><h3>Methods</h3><div>A prospective observational study was conducted among 82 emergency nurses (49 from Italy, 33 from Israel), each independently evaluating five standardized clinical cases. Their decisions were compared with those generated by Generative Artificial Intelligence models using a structured evaluation rubric. Statistical analyses included ANOVA, chi-square tests, logistic regression, and receiver operating characteristic curve analysis to assess predictive accuracy.</div></div><div><h3>Results</h3><div>Generative Artificial Intelligence models exhibited higher overall decision accuracy and stronger alignment with expert recommendations. However, notable discrepancies emerged in hospitalization decisions and severity assessments. For example, in Case 2, Generative Artificial Intelligence rated severity as level 1, while Italian and Israeli nurses rated it at 1.98 and 2.23, respectively (P < 0.01, F = 199). In Case 1, only 4.1 % of Italian nurses recommended hospitalization compared to 30.3 % of Israeli nurses, whereas all Generative Artificial Intelligence models advised hospitalization. Nurses showed greater variability in test selection and severity judgments, reflecting their use of clinical intuition and contextual reasoning. Demographics such as age, gender, and years of experience did not significantly predict accuracy.</div></div><div><h3>Conclusions</h3><div>Generative Artificial Intelligence models demonstrated consistency and expert alignment but lacked the contextual sensitivity vital in emergency care. These results highlight the potential of Generative Artificial Intelligence as a clinical decision-support tool while emphasizing the continued importance of human clinical judgment.</div></div>\",\"PeriodicalId\":50299,\"journal\":{\"name\":\"International Journal of Nursing Studies\",\"volume\":\"172 \",\"pages\":\"Article 105216\"},\"PeriodicalIF\":7.1000,\"publicationDate\":\"2025-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Nursing Studies\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020748925002263\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"NURSING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Nursing Studies","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020748925002263","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NURSING","Score":null,"Total":0}
引用次数: 0
摘要
临床决策是急诊护理的核心能力,需要快速准确的评估。随着生成式人工智能在医疗保健领域的日益整合,迫切需要了解其作为临床决策支持工具的潜力。虽然生成式人工智能模型显示出高度的准确性和一致性,但它们在复杂、上下文敏感的场景中导航的能力仍然存在问题。本研究旨在比较以色列和意大利急诊护士使用生成式人工智能模型(Claude-3.5、ChatGPT-4.0和Gemini-1.5)的临床决策表现。它评估了严重程度评估、住院决定和测试选择的差异,同时探讨了人口统计学和专业特征对决策准确性的影响。方法对82名急诊护士(49名意大利护士,33名以色列护士)进行前瞻性观察研究,每名护士独立评估5例标准化临床病例。他们的决定与生成式人工智能模型产生的决定进行了比较,并使用了结构化的评估准则。统计分析包括方差分析、卡方检验、逻辑回归和受试者工作特征曲线分析来评估预测准确性。结果生成式人工智能模型显示出更高的整体决策准确性和与专家建议更强的一致性。然而,在住院决定和严重程度评估方面出现了显著差异。例如,在案例2中,生成人工智能将严重程度评为1级,而意大利和以色列的护士分别为1.98和2.23 (P < 0.01, F = 199)。在案例1中,只有4.1%的意大利护士建议住院治疗,而以色列护士的这一比例为30.3%,而所有生成人工智能模型都建议住院治疗。护士在测试选择和严重程度判断方面表现出更大的可变性,反映了他们使用临床直觉和上下文推理。年龄、性别和经验年数等人口统计数据不能显著预测准确性。结论生成式人工智能模型表现出一致性和专家一致性,但缺乏对急诊护理至关重要的上下文敏感性。这些结果突出了生成式人工智能作为临床决策支持工具的潜力,同时强调了人类临床判断的持续重要性。
Nursing judgment in the age of generative artificial intelligence: A cross-national study on clinical decision-making performance among emergency nurses
Background
Clinical decision-making is a core competency in emergency nursing, requiring rapid and accurate assessments. With the growing integration of Generative Artificial Intelligence in healthcare, there is a pressing need to understand its potential as a clinical decision support tool. While Generative Artificial Intelligence models show high accuracy and consistency, their ability to navigate complex, context-sensitive scenarios remains in question.
Objectives
This study aimed to compare the clinical decision-making performance of emergency nurses from Israel and Italy with Generative Artificial Intelligence models (Claude-3.5, ChatGPT-4.0, and Gemini-1.5). It evaluated differences in severity assessment, hospitalization decisions, and test selection, while exploring the influence of demographic and professional characteristics on decision accuracy.
Methods
A prospective observational study was conducted among 82 emergency nurses (49 from Italy, 33 from Israel), each independently evaluating five standardized clinical cases. Their decisions were compared with those generated by Generative Artificial Intelligence models using a structured evaluation rubric. Statistical analyses included ANOVA, chi-square tests, logistic regression, and receiver operating characteristic curve analysis to assess predictive accuracy.
Results
Generative Artificial Intelligence models exhibited higher overall decision accuracy and stronger alignment with expert recommendations. However, notable discrepancies emerged in hospitalization decisions and severity assessments. For example, in Case 2, Generative Artificial Intelligence rated severity as level 1, while Italian and Israeli nurses rated it at 1.98 and 2.23, respectively (P < 0.01, F = 199). In Case 1, only 4.1 % of Italian nurses recommended hospitalization compared to 30.3 % of Israeli nurses, whereas all Generative Artificial Intelligence models advised hospitalization. Nurses showed greater variability in test selection and severity judgments, reflecting their use of clinical intuition and contextual reasoning. Demographics such as age, gender, and years of experience did not significantly predict accuracy.
Conclusions
Generative Artificial Intelligence models demonstrated consistency and expert alignment but lacked the contextual sensitivity vital in emergency care. These results highlight the potential of Generative Artificial Intelligence as a clinical decision-support tool while emphasizing the continued importance of human clinical judgment.
期刊介绍:
The International Journal of Nursing Studies (IJNS) is a highly respected journal that has been publishing original peer-reviewed articles since 1963. It provides a forum for original research and scholarship about health care delivery, organisation, management, workforce, policy, and research methods relevant to nursing, midwifery, and other health related professions. The journal aims to support evidence informed policy and practice by publishing research, systematic and other scholarly reviews, critical discussion, and commentary of the highest standard. The IJNS is indexed in major databases including PubMed, Medline, Thomson Reuters - Science Citation Index, Scopus, Thomson Reuters - Social Science Citation Index, CINAHL, and the BNI (British Nursing Index).