Benefits and harms associated with the use of AI-related algorithmic decision-making systems by healthcare professionals: a systematic review

IF 13.6 Q1 HEALTH CARE SCIENCES & SERVICES

Lancet Regional Health-Europe Pub Date : 2024-12-01 DOI:10.1016/j.lanepe.2024.101145

Christoph Wilhelm , Anke Steckelberg , Felix G. Rebitschek

{"title":"Benefits and harms associated with the use of AI-related algorithmic decision-making systems by healthcare professionals: a systematic review","authors":"Christoph Wilhelm , Anke Steckelberg , Felix G. Rebitschek","doi":"10.1016/j.lanepe.2024.101145","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Despite notable advancements in artificial intelligence (AI) that enable complex systems to perform certain tasks more accurately than medical experts, the impact on patient-relevant outcomes remains uncertain. To address this gap, this systematic review assesses the benefits and harms associated with AI-related algorithmic decision-making (ADM) systems used by healthcare professionals, compared to standard care.</div></div><div><h3>Methods</h3><div>In accordance with the PRISMA guidelines, we included interventional and observational studies published as peer-reviewed full-text articles that met the following criteria: human patients; interventions involving algorithmic decision-making systems, developed with and/or utilizing machine learning (ML); and outcomes describing patient-relevant benefits and harms that directly affect health and quality of life, such as mortality and morbidity. Studies that did not undergo preregistration, lacked a standard-of-care control, or pertained to systems that assist in the execution of actions (e.g., in robotics) were excluded. We searched MEDLINE, EMBASE, IEEE Xplore, and Google Scholar for studies published in the past decade up to 31 March 2024. We assessed risk of bias using Cochrane's RoB 2 and ROBINS-I tools, and reporting transparency with CONSORT-AI and TRIPOD-AI. Two researchers independently managed the processes and resolved conflicts through discussion. This review has been registered with PROSPERO (CRD42023412156) and the study protocol has been published.</div></div><div><h3>Findings</h3><div>Out of 2,582 records identified after deduplication, 18 randomized controlled trials (RCTs) and one cohort study met the inclusion criteria, covering specialties such as psychiatry, oncology, and internal medicine. Collectively, the studies included a median of 243 patients (IQR 124–828), with a median of 50.5% female participants (range 12.5–79.0, IQR 43.6–53.6) across intervention and control groups. Four studies were classified as having low risk of bias, seven showed some concerns, and another seven were assessed as having high or serious risk of bias. Reporting transparency varied considerably: six studies showed high compliance, four moderate, and five low compliance with CONSORT-AI or TRIPOD-AI. Twelve studies (63%) reported patient-relevant benefits. Of those with low risk of bias, interventions reduced length of stay in hospital and intensive care unit (10.3 vs. 13.0 days, p = 0.042; 6.3 vs. 8.4 days, p = 0.030), in-hospital mortality (9.0% vs. 21.3%, p = 0.018), and depression symptoms in non-complex cases (45.1% vs. 52.3%, p = 0.03). However, harms were frequently underreported, with only eight studies (42%) documenting adverse events. No study reported an increase in adverse events as a result of the interventions.</div></div><div><h3>Interpretation</h3><div>The current evidence on AI-related ADM systems provides limited insights into patient-relevant outcomes. Our findings underscore the essential need for rigorous evaluations of clinical benefits, reinforced compliance with methodological standards, and balanced consideration of both benefits and harms to ensure meaningful integration into healthcare practice.</div></div><div><h3>Funding</h3><div>This study did not receive any funding.</div></div>","PeriodicalId":53223,"journal":{"name":"Lancet Regional Health-Europe","volume":"48 ","pages":"Article 101145"},"PeriodicalIF":13.6000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Regional Health-Europe","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666776224003144","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Despite notable advancements in artificial intelligence (AI) that enable complex systems to perform certain tasks more accurately than medical experts, the impact on patient-relevant outcomes remains uncertain. To address this gap, this systematic review assesses the benefits and harms associated with AI-related algorithmic decision-making (ADM) systems used by healthcare professionals, compared to standard care.

Methods

In accordance with the PRISMA guidelines, we included interventional and observational studies published as peer-reviewed full-text articles that met the following criteria: human patients; interventions involving algorithmic decision-making systems, developed with and/or utilizing machine learning (ML); and outcomes describing patient-relevant benefits and harms that directly affect health and quality of life, such as mortality and morbidity. Studies that did not undergo preregistration, lacked a standard-of-care control, or pertained to systems that assist in the execution of actions (e.g., in robotics) were excluded. We searched MEDLINE, EMBASE, IEEE Xplore, and Google Scholar for studies published in the past decade up to 31 March 2024. We assessed risk of bias using Cochrane's RoB 2 and ROBINS-I tools, and reporting transparency with CONSORT-AI and TRIPOD-AI. Two researchers independently managed the processes and resolved conflicts through discussion. This review has been registered with PROSPERO (CRD42023412156) and the study protocol has been published.

Findings

Out of 2,582 records identified after deduplication, 18 randomized controlled trials (RCTs) and one cohort study met the inclusion criteria, covering specialties such as psychiatry, oncology, and internal medicine. Collectively, the studies included a median of 243 patients (IQR 124–828), with a median of 50.5% female participants (range 12.5–79.0, IQR 43.6–53.6) across intervention and control groups. Four studies were classified as having low risk of bias, seven showed some concerns, and another seven were assessed as having high or serious risk of bias. Reporting transparency varied considerably: six studies showed high compliance, four moderate, and five low compliance with CONSORT-AI or TRIPOD-AI. Twelve studies (63%) reported patient-relevant benefits. Of those with low risk of bias, interventions reduced length of stay in hospital and intensive care unit (10.3 vs. 13.0 days, p = 0.042; 6.3 vs. 8.4 days, p = 0.030), in-hospital mortality (9.0% vs. 21.3%, p = 0.018), and depression symptoms in non-complex cases (45.1% vs. 52.3%, p = 0.03). However, harms were frequently underreported, with only eight studies (42%) documenting adverse events. No study reported an increase in adverse events as a result of the interventions.

Interpretation

The current evidence on AI-related ADM systems provides limited insights into patient-relevant outcomes. Our findings underscore the essential need for rigorous evaluations of clinical benefits, reinforced compliance with methodological standards, and balanced consideration of both benefits and harms to ensure meaningful integration into healthcare practice.

Funding

This study did not receive any funding.

查看原文本刊更多论文

医疗保健专业人员使用人工智能相关算法决策系统的利弊：系统综述

尽管人工智能（AI）取得了显著进步，使复杂系统能够比医学专家更准确地执行某些任务，但对患者相关结果的影响仍然不确定。为了解决这一差距，本系统综述评估了医疗保健专业人员使用的人工智能相关算法决策（ADM）系统与标准护理相比的利弊。方法根据PRISMA指南，我们纳入了以同行评审全文发表的符合以下标准的干预性和观察性研究：人类患者；涉及算法决策系统的干预，由机器学习（ML）开发和/或利用机器学习（ML）；以及描述直接影响健康和生活质量的与患者相关的益处和危害的结果，例如死亡率和发病率。未进行预注册、缺乏标准护理控制或涉及辅助操作系统（如机器人）的研究被排除在外。我们检索了MEDLINE、EMBASE、IEEE explore和b谷歌Scholar，检索了截至2024年3月31日的过去十年中发表的研究。我们使用Cochrane的rob2和ROBINS-I工具评估偏倚风险，并使用conber - ai和TRIPOD-AI报告透明度。两名研究人员独立管理过程，并通过讨论解决冲突。该综述已在PROSPERO注册（CRD42023412156），研究方案已发表。在重复数据删除后确定的2582条记录中，18项随机对照试验（rct）和1项队列研究符合纳入标准，涵盖了精神病学、肿瘤学和内科等专业。总的来说，这些研究中位数为243例患者（IQR 124-828），干预组和对照组中位数为50.5%的女性参与者（范围12.5-79.0,IQR 43.6-53.6）。4项研究被归类为低偏倚风险，7项显示出一些担忧，另外7项被评估为高或严重偏倚风险。报告透明度差异很大：6项研究显示高依从性，4项中等依从性，5项低依从性。12项研究（63%）报告了与患者相关的益处。在低偏倚风险患者中，干预措施缩短了住院和重症监护病房的住院时间(10.3天vs 13.0天，p = 0.042；6.3天对8.4天，p = 0.030)、住院死亡率（9.0%对21.3%,p = 0.018）和非复杂病例的抑郁症状（45.1%对52.3%,p = 0.03）。然而，危害经常被低估，只有8项研究（42%）记录了不良事件。没有研究报告由于干预而导致不良事件的增加。目前关于人工智能相关的ADM系统的证据对患者相关的结果提供了有限的见解。我们的研究结果强调了对临床益处进行严格评估的必要性，加强了对方法学标准的遵守，并平衡考虑了利弊，以确保有意义地融入医疗保健实践。本研究未获得任何资助。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Lancet Regional Health-Europe Multiple-

CiteScore

19.90

自引率

1.40%

发文量

260

审稿时长

9 weeks

期刊介绍： The Lancet Regional Health – Europe, a gold open access journal, is part of The Lancet's global effort to promote healthcare quality and accessibility worldwide. It focuses on advancing clinical practice and health policy in the European region to enhance health outcomes. The journal publishes high-quality original research advocating changes in clinical practice and health policy. It also includes reviews, commentaries, and opinion pieces on regional health topics, such as infection and disease prevention, healthy aging, and reducing health disparities.