Detecting papilloedema as a marker of raised intracranial pressure using artificial intelligence: A systematic review.

IF 7.7

PLOS digital health Pub Date : 2025-09-02 eCollection Date: 2025-09-01 DOI:10.1371/journal.pdig.0000783

Lekaashree Rambabu, Thomas Edmiston, Brandon G Smith, Katharina Kohler, Angelos G Kolias, Richard A I Bethlehem, Pearse A Keane, Hani J Marcus, EyeVu Consortium, Peter J Hutchinson, Tom Bashford

{"title":"Detecting papilloedema as a marker of raised intracranial pressure using artificial intelligence: A systematic review.","authors":"Lekaashree Rambabu, Thomas Edmiston, Brandon G Smith, Katharina Kohler, Angelos G Kolias, Richard A I Bethlehem, Pearse A Keane, Hani J Marcus, EyeVu Consortium, Peter J Hutchinson, Tom Bashford","doi":"10.1371/journal.pdig.0000783","DOIUrl":null,"url":null,"abstract":"<p><p>Automated detection of papilloedema using artificial intelligence (AI) and retinal images acquired through an ophthalmoscope for triage of patients with potential intracranial pathology could prove to be beneficial, particularly in resource-limited settings where access to neuroimaging may be limited. However, a comprehensive overview of the current literature on this field is lacking. We conducted a systematic review on the use of AI for papilloedema detection by searching four databases: Ovid MEDLINE, Embase, Web of Science, and IEEE Xplore. Included studies were assessed for quality of reporting using the Checklist for AI in Medical Imaging and appraised using a novel 5-domain rubric, 'SMART', for the presence of bias. For a subset of studies, we also assessed the diagnostic test accuracy using the 'Metadta' command on Stata. Nineteen deep learning systems and eight non-deep learning systems were included. The median number of images of normal optic discs used in the training set was 2509 (IQR 580-9156) and in the testing set was 569 (IQR 119-1378). The number of papilloedema images in the training and testing sets was lower with a median of 1292 (IQR 201-2882) in training set and 201 (IQR 57-388) in the testing set. Age and gender were the two most frequently reported demographic data, included by one-third of the studies. Only ten studies performed external validation. The pooled sensitivity and specificity were calculated to be 0.87 [95% CI 0.76-0.93] and 0.90 [95% CI 0.74-0.97], respectively. Though AI model performance values are reported to be high, these results need to be interpreted with caution due highly biased data selection, poor quality of reporting, and limited evidence of reproducibility. Deep learning models show promise in retinal image analysis of papilloedema, however, external validation using large, diverse datasets in a variety of clinical settings is required before it can be considered a tool for triage of intracranial pathologies in resource-limited areas.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 9","pages":"e0000783"},"PeriodicalIF":7.7000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12404415/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000783","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Automated detection of papilloedema using artificial intelligence (AI) and retinal images acquired through an ophthalmoscope for triage of patients with potential intracranial pathology could prove to be beneficial, particularly in resource-limited settings where access to neuroimaging may be limited. However, a comprehensive overview of the current literature on this field is lacking. We conducted a systematic review on the use of AI for papilloedema detection by searching four databases: Ovid MEDLINE, Embase, Web of Science, and IEEE Xplore. Included studies were assessed for quality of reporting using the Checklist for AI in Medical Imaging and appraised using a novel 5-domain rubric, 'SMART', for the presence of bias. For a subset of studies, we also assessed the diagnostic test accuracy using the 'Metadta' command on Stata. Nineteen deep learning systems and eight non-deep learning systems were included. The median number of images of normal optic discs used in the training set was 2509 (IQR 580-9156) and in the testing set was 569 (IQR 119-1378). The number of papilloedema images in the training and testing sets was lower with a median of 1292 (IQR 201-2882) in training set and 201 (IQR 57-388) in the testing set. Age and gender were the two most frequently reported demographic data, included by one-third of the studies. Only ten studies performed external validation. The pooled sensitivity and specificity were calculated to be 0.87 [95% CI 0.76-0.93] and 0.90 [95% CI 0.74-0.97], respectively. Though AI model performance values are reported to be high, these results need to be interpreted with caution due highly biased data selection, poor quality of reporting, and limited evidence of reproducibility. Deep learning models show promise in retinal image analysis of papilloedema, however, external validation using large, diverse datasets in a variety of clinical settings is required before it can be considered a tool for triage of intracranial pathologies in resource-limited areas.

Abstract Image

查看原文本刊更多论文

用人工智能检测乳头水肿作为颅内压升高的标志：一项系统综述。

使用人工智能（AI）和通过检眼镜获得的视网膜图像自动检测乳头状水肿，对潜在颅内病理患者进行分诊可能是有益的，特别是在资源有限的环境中，神经影像学的获取可能有限。然而，缺乏对该领域当前文献的全面概述。我们通过搜索四个数据库（Ovid MEDLINE、Embase、Web of Science和IEEE explore），对人工智能在乳头状水肿检测中的应用进行了系统回顾。使用医学影像学人工智能检查表评估纳入的研究的报告质量，并使用新的5域标题“SMART”进行评估，以确定是否存在偏倚。对于一部分研究，我们还使用Stata上的“Metadta”命令评估了诊断测试的准确性。包括19个深度学习系统和8个非深度学习系统。训练集使用的正常视盘图像中位数为2509 (IQR 580-9156)，测试集使用的正常视盘图像中位数为569 （IQR 119-1378）。训练集和测试集的乳头水肿图像数量较低，训练集的中位数为1292张（IQR 201-2882），测试集的中位数为201张（IQR 57-388）。年龄和性别是最常报告的两个人口统计数据，三分之一的研究包括了这两个数据。只有10项研究进行了外部验证。合并敏感性和特异性分别为0.87 [95% CI 0.76-0.93]和0.90 [95% CI 0.74-0.97]。尽管人工智能模型的性能值被报道得很高，但由于数据选择高度偏倚、报告质量差以及可重复性证据有限，这些结果需要谨慎解释。深度学习模型在乳头状水肿的视网膜图像分析中显示出前景，然而，在将其视为资源有限地区颅内病变分诊的工具之前，需要在各种临床环境中使用大量不同的数据集进行外部验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PLOS digital health

自引率

0.00%

发文量