Implementing an AI algorithm in the clinical setting: a case study for the accuracy paradox.

IF 4.7 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

European Radiology Pub Date : 2025-07-01 Epub Date: 2024-12-31 DOI:10.1007/s00330-024-11332-z

John A Scaringi, Ryan A McTaggart, Matthew D Alvin, Michael Atalay, Michael H Bernstein, Mahesh V Jayaraman, Gaurav Jindal, Jonathan S Movson, David W Swenson, Grayson L Baird

{"title":"Implementing an AI algorithm in the clinical setting: a case study for the accuracy paradox.","authors":"John A Scaringi, Ryan A McTaggart, Matthew D Alvin, Michael Atalay, Michael H Bernstein, Mahesh V Jayaraman, Gaurav Jindal, Jonathan S Movson, David W Swenson, Grayson L Baird","doi":"10.1007/s00330-024-11332-z","DOIUrl":null,"url":null,"abstract":"Objectives: We report our experience implementing an algorithm for the detection of large vessel occlusion (LVO) for suspected stroke in the emergency setting, including its performance, and offer an explanation as to why it was poorly received by radiologists.Materials and methods: An algorithm was deployed in the emergency room at a single tertiary care hospital for the detection of LVO on CT angiography (CTA) between September 1st-27th, 2021. A retrospective analysis of the algorithm's accuracy was performed.Results: During the study period, 48 patients underwent CTA examination in the emergency department to evaluate for emergent LVO, with 2 positive cases (60.3 years ± 18.2; 32 women). The LVO algorithm demonstrated a sensitivity and specificity of 100% and 92%, respectively. While the sensitivity of the algorithm at our institution was even higher than the manufacturer's reported values, the false discovery rate was 67%, leading to the perception that the algorithm was inaccurate. In addition, the positive predictive value at our institution was 33% compared with the manufacturer's reported values of 95-98%. This disparity can be attributed to differences in disease prevalence of 4.1% at our institution compared with 45.0-62.2% from the manufacturer's reported values.Conclusion: Despite the LVO algorithm's accuracy performing as advertised, it was perceived as inaccurate due to more false positives than anticipated and was removed from clinical practice. This was likely due to a cognitive bias called the accuracy paradox. To mitigate the accuracy paradox, radiologists should be presented with metrics based on a disease prevalence similar to their practice when evaluating and utilizing artificial intelligence tools.Key points: Question An artificial intelligence algorithm for detecting emergent LVOs was implemented in an emergency department, but it was perceived to be inaccurate. Findings Although the algorithm's accuracy was both high and as advertised, the algorithm demonstrated a high false discovery rate. Clinical relevance The misperception of the algorithm's inaccuracy was likely due to a special case of the base rate fallacy-the accuracy paradox. Equipping radiologists with an algorithm's false discovery rate based on local prevalence will ensure realistic expectations for real-world performance.","PeriodicalId":12076,"journal":{"name":"European Radiology","volume":" ","pages":"4347-4353"},"PeriodicalIF":4.7000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00330-024-11332-z","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/31 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: We report our experience implementing an algorithm for the detection of large vessel occlusion (LVO) for suspected stroke in the emergency setting, including its performance, and offer an explanation as to why it was poorly received by radiologists.

Materials and methods: An algorithm was deployed in the emergency room at a single tertiary care hospital for the detection of LVO on CT angiography (CTA) between September 1st-27th, 2021. A retrospective analysis of the algorithm's accuracy was performed.

Results: During the study period, 48 patients underwent CTA examination in the emergency department to evaluate for emergent LVO, with 2 positive cases (60.3 years ± 18.2; 32 women). The LVO algorithm demonstrated a sensitivity and specificity of 100% and 92%, respectively. While the sensitivity of the algorithm at our institution was even higher than the manufacturer's reported values, the false discovery rate was 67%, leading to the perception that the algorithm was inaccurate. In addition, the positive predictive value at our institution was 33% compared with the manufacturer's reported values of 95-98%. This disparity can be attributed to differences in disease prevalence of 4.1% at our institution compared with 45.0-62.2% from the manufacturer's reported values.

Conclusion: Despite the LVO algorithm's accuracy performing as advertised, it was perceived as inaccurate due to more false positives than anticipated and was removed from clinical practice. This was likely due to a cognitive bias called the accuracy paradox. To mitigate the accuracy paradox, radiologists should be presented with metrics based on a disease prevalence similar to their practice when evaluating and utilizing artificial intelligence tools.

Key points: Question An artificial intelligence algorithm for detecting emergent LVOs was implemented in an emergency department, but it was perceived to be inaccurate. Findings Although the algorithm's accuracy was both high and as advertised, the algorithm demonstrated a high false discovery rate. Clinical relevance The misperception of the algorithm's inaccuracy was likely due to a special case of the base rate fallacy-the accuracy paradox. Equipping radiologists with an algorithm's false discovery rate based on local prevalence will ensure realistic expectations for real-world performance.

查看原文本刊更多论文

在临床环境中实施人工智能算法：准确性悖论的案例研究。

目的：我们报告了我们在急诊环境中用于检测疑似中风的大血管闭塞（LVO）的算法的经验，包括其性能，并解释了为什么放射科医生对其评价不佳。材料与方法：于2021年9月1日至27日在某单一三级医院急诊室部署了一种CT血管造影（CTA） LVO检测算法。对算法的准确性进行了回顾性分析。结果：在研究期间，48例患者在急诊科接受了CTA检查以评估急诊LVO，其中2例阳性(60.3年±18.2；32岁女性)。LVO算法的灵敏度为100%，特异度为92%。虽然我们机构的算法灵敏度甚至高于制造商报告的值，但错误发现率为67%，导致人们认为算法不准确。此外，我们机构的阳性预测值为33%，而制造商报告的值为95-98%。这种差异可归因于我们机构的患病率为4.1%，而制造商报告的患病率为45.0-62.2%。结论：尽管LVO算法的准确性达到了广告的要求，但由于假阳性比预期的要多，因此被认为是不准确的，因此被从临床实践中删除。这可能是由于一种被称为准确性悖论的认知偏见。为了缓解准确性悖论，放射科医生在评估和利用人工智能工具时，应该提供基于疾病患病率的指标，类似于他们的实践。在急诊科实施了一种用于检测紧急lvo的人工智能算法，但被认为是不准确的。尽管该算法的准确率很高，正如所宣传的那样，但该算法显示出很高的错误发现率。对算法不准确的误解很可能是由于基本率谬论的一种特殊情况-准确性悖论。为放射科医生配备基于当地患病率的算法的错误发现率，将确保对现实世界表现的现实预期。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Radiology 医学-核医学

CiteScore

11.60

自引率

8.50%

发文量

874

审稿时长

2-4 weeks

期刊介绍： European Radiology (ER) continuously updates scientific knowledge in radiology by publication of strong original articles and state-of-the-art reviews written by leading radiologists. A well balanced combination of review articles, original papers, short communications from European radiological congresses and information on society matters makes ER an indispensable source for current information in this field. This is the Journal of the European Society of Radiology, and the official journal of a number of societies. From 2004-2008 supplements to European Radiology were published under its companion, European Radiology Supplements, ISSN 1613-3749.