Nursing judgment in the age of generative artificial intelligence: A cross-national study on clinical decision-making performance among emergency nurses
{"title":"Nursing judgment in the age of generative artificial intelligence: A cross-national study on clinical decision-making performance among emergency nurses","authors":"C. Levin , A. Zaboli , G. Turcato , M. Saban","doi":"10.1016/j.ijnurstu.2025.105216","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Clinical decision-making is a core competency in emergency nursing, requiring rapid and accurate assessments. With the growing integration of Generative Artificial Intelligence in healthcare, there is a pressing need to understand its potential as a clinical decision support tool. While Generative Artificial Intelligence models show high accuracy and consistency, their ability to navigate complex, context-sensitive scenarios remains in question.</div></div><div><h3>Objectives</h3><div>This study aimed to compare the clinical decision-making performance of emergency nurses from Israel and Italy with Generative Artificial Intelligence models (Claude-3.5, ChatGPT-4.0, and Gemini-1.5). It evaluated differences in severity assessment, hospitalization decisions, and test selection, while exploring the influence of demographic and professional characteristics on decision accuracy.</div></div><div><h3>Methods</h3><div>A prospective observational study was conducted among 82 emergency nurses (49 from Italy, 33 from Israel), each independently evaluating five standardized clinical cases. Their decisions were compared with those generated by Generative Artificial Intelligence models using a structured evaluation rubric. Statistical analyses included ANOVA, chi-square tests, logistic regression, and receiver operating characteristic curve analysis to assess predictive accuracy.</div></div><div><h3>Results</h3><div>Generative Artificial Intelligence models exhibited higher overall decision accuracy and stronger alignment with expert recommendations. However, notable discrepancies emerged in hospitalization decisions and severity assessments. For example, in Case 2, Generative Artificial Intelligence rated severity as level 1, while Italian and Israeli nurses rated it at 1.98 and 2.23, respectively (P < 0.01, F = 199). In Case 1, only 4.1 % of Italian nurses recommended hospitalization compared to 30.3 % of Israeli nurses, whereas all Generative Artificial Intelligence models advised hospitalization. Nurses showed greater variability in test selection and severity judgments, reflecting their use of clinical intuition and contextual reasoning. Demographics such as age, gender, and years of experience did not significantly predict accuracy.</div></div><div><h3>Conclusions</h3><div>Generative Artificial Intelligence models demonstrated consistency and expert alignment but lacked the contextual sensitivity vital in emergency care. These results highlight the potential of Generative Artificial Intelligence as a clinical decision-support tool while emphasizing the continued importance of human clinical judgment.</div></div>","PeriodicalId":50299,"journal":{"name":"International Journal of Nursing Studies","volume":"172 ","pages":"Article 105216"},"PeriodicalIF":7.1000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Nursing Studies","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020748925002263","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NURSING","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Clinical decision-making is a core competency in emergency nursing, requiring rapid and accurate assessments. With the growing integration of Generative Artificial Intelligence in healthcare, there is a pressing need to understand its potential as a clinical decision support tool. While Generative Artificial Intelligence models show high accuracy and consistency, their ability to navigate complex, context-sensitive scenarios remains in question.
Objectives
This study aimed to compare the clinical decision-making performance of emergency nurses from Israel and Italy with Generative Artificial Intelligence models (Claude-3.5, ChatGPT-4.0, and Gemini-1.5). It evaluated differences in severity assessment, hospitalization decisions, and test selection, while exploring the influence of demographic and professional characteristics on decision accuracy.
Methods
A prospective observational study was conducted among 82 emergency nurses (49 from Italy, 33 from Israel), each independently evaluating five standardized clinical cases. Their decisions were compared with those generated by Generative Artificial Intelligence models using a structured evaluation rubric. Statistical analyses included ANOVA, chi-square tests, logistic regression, and receiver operating characteristic curve analysis to assess predictive accuracy.
Results
Generative Artificial Intelligence models exhibited higher overall decision accuracy and stronger alignment with expert recommendations. However, notable discrepancies emerged in hospitalization decisions and severity assessments. For example, in Case 2, Generative Artificial Intelligence rated severity as level 1, while Italian and Israeli nurses rated it at 1.98 and 2.23, respectively (P < 0.01, F = 199). In Case 1, only 4.1 % of Italian nurses recommended hospitalization compared to 30.3 % of Israeli nurses, whereas all Generative Artificial Intelligence models advised hospitalization. Nurses showed greater variability in test selection and severity judgments, reflecting their use of clinical intuition and contextual reasoning. Demographics such as age, gender, and years of experience did not significantly predict accuracy.
Conclusions
Generative Artificial Intelligence models demonstrated consistency and expert alignment but lacked the contextual sensitivity vital in emergency care. These results highlight the potential of Generative Artificial Intelligence as a clinical decision-support tool while emphasizing the continued importance of human clinical judgment.
期刊介绍:
The International Journal of Nursing Studies (IJNS) is a highly respected journal that has been publishing original peer-reviewed articles since 1963. It provides a forum for original research and scholarship about health care delivery, organisation, management, workforce, policy, and research methods relevant to nursing, midwifery, and other health related professions. The journal aims to support evidence informed policy and practice by publishing research, systematic and other scholarly reviews, critical discussion, and commentary of the highest standard. The IJNS is indexed in major databases including PubMed, Medline, Thomson Reuters - Science Citation Index, Scopus, Thomson Reuters - Social Science Citation Index, CINAHL, and the BNI (British Nursing Index).