Systematic Review of Artificial Intelligence for Abnormality Detection in High-volume Neuroimaging and Subgroup Meta-analysis for Intracranial Hemorrhage Detection.

IF 2.8 3区医学 Q2 Medicine

Clinical Neuroradiology Pub Date : 2023-12-01 Epub Date: 2023-06-01 DOI:10.1007/s00062-023-01291-1

Siddharth Agarwal, David Wood, Mariusz Grzeda, Chandhini Suresh, Munaib Din, James Cole, Marc Modat, Thomas C Booth

{"title":"Systematic Review of Artificial Intelligence for Abnormality Detection in High-volume Neuroimaging and Subgroup Meta-analysis for Intracranial Hemorrhage Detection.","authors":"Siddharth Agarwal, David Wood, Mariusz Grzeda, Chandhini Suresh, Munaib Din, James Cole, Marc Modat, Thomas C Booth","doi":"10.1007/s00062-023-01291-1","DOIUrl":null,"url":null,"abstract":"Purpose: Most studies evaluating artificial intelligence (AI) models that detect abnormalities in neuroimaging are either tested on unrepresentative patient cohorts or are insufficiently well-validated, leading to poor generalisability to real-world tasks. The aim was to determine the diagnostic test accuracy and summarise the evidence supporting the use of AI models performing first-line, high-volume neuroimaging tasks.Methods: Medline, Embase, Cochrane library and Web of Science were searched until September 2021 for studies that temporally or externally validated AI capable of detecting abnormalities in first-line computed tomography (CT) or magnetic resonance (MR) neuroimaging. A bivariate random effects model was used for meta-analysis where appropriate. This study was registered on PROSPERO as CRD42021269563.Results: Out of 42,870 records screened, and 5734 potentially eligible full texts, only 16 studies were eligible for inclusion. Included studies were not compromised by unrepresentative datasets or inadequate validation methodology. Direct comparison with radiologists was available in 4/16 studies and 15/16 had a high risk of bias. Meta-analysis was only suitable for intracranial hemorrhage detection in CT imaging (10/16 studies), where AI systems had a pooled sensitivity and specificity 0.90 (95% confidence interval [CI] 0.85-0.94) and 0.90 (95% CI 0.83-0.95), respectively. Other AI studies using CT and MRI detected target conditions other than hemorrhage (2/16), or multiple target conditions (4/16). Only 3/16 studies implemented AI in clinical pathways, either for pre-read triage or as post-read discrepancy identifiers.Conclusion: The paucity of eligible studies reflects that most abnormality detection AI studies were not adequately validated in representative clinical cohorts. The few studies describing how abnormality detection AI could impact patients and clinicians did not explore the full ramifications of clinical implementation.","PeriodicalId":10391,"journal":{"name":"Clinical Neuroradiology","volume":" ","pages":"943-956"},"PeriodicalIF":2.8000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10233528/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Neuroradiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00062-023-01291-1","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/6/1 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Most studies evaluating artificial intelligence (AI) models that detect abnormalities in neuroimaging are either tested on unrepresentative patient cohorts or are insufficiently well-validated, leading to poor generalisability to real-world tasks. The aim was to determine the diagnostic test accuracy and summarise the evidence supporting the use of AI models performing first-line, high-volume neuroimaging tasks.

Methods: Medline, Embase, Cochrane library and Web of Science were searched until September 2021 for studies that temporally or externally validated AI capable of detecting abnormalities in first-line computed tomography (CT) or magnetic resonance (MR) neuroimaging. A bivariate random effects model was used for meta-analysis where appropriate. This study was registered on PROSPERO as CRD42021269563.

Results: Out of 42,870 records screened, and 5734 potentially eligible full texts, only 16 studies were eligible for inclusion. Included studies were not compromised by unrepresentative datasets or inadequate validation methodology. Direct comparison with radiologists was available in 4/16 studies and 15/16 had a high risk of bias. Meta-analysis was only suitable for intracranial hemorrhage detection in CT imaging (10/16 studies), where AI systems had a pooled sensitivity and specificity 0.90 (95% confidence interval [CI] 0.85-0.94) and 0.90 (95% CI 0.83-0.95), respectively. Other AI studies using CT and MRI detected target conditions other than hemorrhage (2/16), or multiple target conditions (4/16). Only 3/16 studies implemented AI in clinical pathways, either for pre-read triage or as post-read discrepancy identifiers.

Conclusion: The paucity of eligible studies reflects that most abnormality detection AI studies were not adequately validated in representative clinical cohorts. The few studies describing how abnormality detection AI could impact patients and clinicians did not explore the full ramifications of clinical implementation.

Abstract Image

查看原文本刊更多论文

人工智能在大容量神经影像学异常检测中的系统评价及颅内出血检测的亚组荟萃分析。

目的:大多数评估人工智能(AI)模型检测神经影像学异常的研究要么在不具代表性的患者群体中进行了测试，要么没有得到充分验证，导致其在现实世界任务中的通用性较差。目的是确定诊断测试的准确性，并总结支持使用人工智能模型执行一线、大容量神经成像任务的证据。方法:检索Medline、Embase、Cochrane library和Web of Science，直到2021年9月，寻找暂时或外部验证的人工智能能够检测一线计算机断层扫描(CT)或磁共振(MR)神经成像异常的研究。适当时采用双变量随机效应模型进行meta分析。本研究在PROSPERO注册为CRD42021269563。结果:在筛选的42,870条记录和5734篇可能符合条件的全文中，只有16项研究符合纳入条件。纳入的研究没有受到不具代表性的数据集或不充分的验证方法的影响。与放射科医生的直接比较在4/16的研究中可用，其中15/16具有高偏倚风险。meta分析仅适用于CT成像中的颅内出血检测(10/16项研究)，其中人工智能系统的综合敏感性和特异性分别为0.90(95%置信区间[CI] 0.85-0.94)和0.90 (95% CI 0.83-0.95)。其他使用CT和MRI的人工智能研究检测了出血以外的目标条件(2/16)，或多个目标条件(4/16)。只有3/16的研究在临床路径中实施了人工智能，无论是用于预读分诊还是作为读后差异标识符。结论:合格研究的缺乏反映了大多数异常检测人工智能研究没有在有代表性的临床队列中得到充分的验证。少数描述异常检测人工智能如何影响患者和临床医生的研究没有探索临床实施的全部后果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Clinical Neuroradiology Medicine-Radiology, Nuclear Medicine and Imaging

CiteScore

4.90

自引率

3.60%

发文量

期刊介绍： Clinical Neuroradiology provides current information, original contributions, and reviews in the field of neuroradiology. An interdisciplinary approach is accomplished by diagnostic and therapeutic contributions related to associated subjects. The international coverage and relevance of the journal is underlined by its being the official journal of the German, Swiss, and Austrian Societies of Neuroradiology.