Paul Festor, Myura Nagendran, Anthony C Gordon, Aldo A Faisal, Matthieu Komorowski
{"title":"Safety of human-AI cooperative decision-making within intensive care: A physical simulation study.","authors":"Paul Festor, Myura Nagendran, Anthony C Gordon, Aldo A Faisal, Matthieu Komorowski","doi":"10.1371/journal.pdig.0000726","DOIUrl":null,"url":null,"abstract":"<p><p>The safety of Artificial Intelligence (AI) systems is as much one of human decision-making as a technological question. In AI-driven decision support systems, particularly in high-stakes settings such as healthcare, ensuring the safety of human-AI interactions is paramount, given the potential risks of following erroneous AI recommendations. To explore this question, we ran a safety-focused clinician-AI interaction study in a physical simulation suite. Physicians were placed in a simulated intensive care ward, with a human nurse (played by an experimenter), an ICU data chart, a high-fidelity patient mannequin and an AI recommender system on a display. Clinicians were asked to prescribe two drugs for the simulated patients suffering from sepsis and wore eye-tracking glasses to allow us to assess where their gaze was directed. We recorded clinician treatment plans before and after they saw the AI treatment recommendations, which could be either 'safe' or 'unsafe'. 92% of clinicians rejected unsafe AI recommendations vs 29% of safe ones. Physicians paid increased attention (+37% gaze fixations) to unsafe AI recommendations vs safe ones. However, visual attention on AI explanations was not greater in unsafe scenarios. Similarly, clinical information (patient monitor, patient chart) did not receive more attention after an unsafe versus safe AI reveal suggesting that the physicians did not look back to these sources of information to investigate why the AI suggestion might be unsafe. Physicians were only successfully persuaded to change their dose by scripted comments from the bedside nurse 5% of the time. Our study emphasises the importance of human oversight in safety-critical AI and the value of evaluating human-AI systems in high-fidelity settings that more closely resemble real world practice.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 2","pages":"e0000726"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11849858/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000726","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The safety of Artificial Intelligence (AI) systems is as much one of human decision-making as a technological question. In AI-driven decision support systems, particularly in high-stakes settings such as healthcare, ensuring the safety of human-AI interactions is paramount, given the potential risks of following erroneous AI recommendations. To explore this question, we ran a safety-focused clinician-AI interaction study in a physical simulation suite. Physicians were placed in a simulated intensive care ward, with a human nurse (played by an experimenter), an ICU data chart, a high-fidelity patient mannequin and an AI recommender system on a display. Clinicians were asked to prescribe two drugs for the simulated patients suffering from sepsis and wore eye-tracking glasses to allow us to assess where their gaze was directed. We recorded clinician treatment plans before and after they saw the AI treatment recommendations, which could be either 'safe' or 'unsafe'. 92% of clinicians rejected unsafe AI recommendations vs 29% of safe ones. Physicians paid increased attention (+37% gaze fixations) to unsafe AI recommendations vs safe ones. However, visual attention on AI explanations was not greater in unsafe scenarios. Similarly, clinical information (patient monitor, patient chart) did not receive more attention after an unsafe versus safe AI reveal suggesting that the physicians did not look back to these sources of information to investigate why the AI suggestion might be unsafe. Physicians were only successfully persuaded to change their dose by scripted comments from the bedside nurse 5% of the time. Our study emphasises the importance of human oversight in safety-critical AI and the value of evaluating human-AI systems in high-fidelity settings that more closely resemble real world practice.