Lekaashree Rambabu, Thomas Edmiston, Brandon G Smith, Katharina Kohler, Angelos G Kolias, Richard A I Bethlehem, Pearse A Keane, Hani J Marcus, EyeVu Consortium, Peter J Hutchinson, Tom Bashford
{"title":"Detecting papilloedema as a marker of raised intracranial pressure using artificial intelligence: A systematic review.","authors":"Lekaashree Rambabu, Thomas Edmiston, Brandon G Smith, Katharina Kohler, Angelos G Kolias, Richard A I Bethlehem, Pearse A Keane, Hani J Marcus, EyeVu Consortium, Peter J Hutchinson, Tom Bashford","doi":"10.1371/journal.pdig.0000783","DOIUrl":null,"url":null,"abstract":"<p><p>Automated detection of papilloedema using artificial intelligence (AI) and retinal images acquired through an ophthalmoscope for triage of patients with potential intracranial pathology could prove to be beneficial, particularly in resource-limited settings where access to neuroimaging may be limited. However, a comprehensive overview of the current literature on this field is lacking. We conducted a systematic review on the use of AI for papilloedema detection by searching four databases: Ovid MEDLINE, Embase, Web of Science, and IEEE Xplore. Included studies were assessed for quality of reporting using the Checklist for AI in Medical Imaging and appraised using a novel 5-domain rubric, 'SMART', for the presence of bias. For a subset of studies, we also assessed the diagnostic test accuracy using the 'Metadta' command on Stata. Nineteen deep learning systems and eight non-deep learning systems were included. The median number of images of normal optic discs used in the training set was 2509 (IQR 580-9156) and in the testing set was 569 (IQR 119-1378). The number of papilloedema images in the training and testing sets was lower with a median of 1292 (IQR 201-2882) in training set and 201 (IQR 57-388) in the testing set. Age and gender were the two most frequently reported demographic data, included by one-third of the studies. Only ten studies performed external validation. The pooled sensitivity and specificity were calculated to be 0.87 [95% CI 0.76-0.93] and 0.90 [95% CI 0.74-0.97], respectively. Though AI model performance values are reported to be high, these results need to be interpreted with caution due highly biased data selection, poor quality of reporting, and limited evidence of reproducibility. Deep learning models show promise in retinal image analysis of papilloedema, however, external validation using large, diverse datasets in a variety of clinical settings is required before it can be considered a tool for triage of intracranial pathologies in resource-limited areas.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 9","pages":"e0000783"},"PeriodicalIF":7.7000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12404415/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000783","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Automated detection of papilloedema using artificial intelligence (AI) and retinal images acquired through an ophthalmoscope for triage of patients with potential intracranial pathology could prove to be beneficial, particularly in resource-limited settings where access to neuroimaging may be limited. However, a comprehensive overview of the current literature on this field is lacking. We conducted a systematic review on the use of AI for papilloedema detection by searching four databases: Ovid MEDLINE, Embase, Web of Science, and IEEE Xplore. Included studies were assessed for quality of reporting using the Checklist for AI in Medical Imaging and appraised using a novel 5-domain rubric, 'SMART', for the presence of bias. For a subset of studies, we also assessed the diagnostic test accuracy using the 'Metadta' command on Stata. Nineteen deep learning systems and eight non-deep learning systems were included. The median number of images of normal optic discs used in the training set was 2509 (IQR 580-9156) and in the testing set was 569 (IQR 119-1378). The number of papilloedema images in the training and testing sets was lower with a median of 1292 (IQR 201-2882) in training set and 201 (IQR 57-388) in the testing set. Age and gender were the two most frequently reported demographic data, included by one-third of the studies. Only ten studies performed external validation. The pooled sensitivity and specificity were calculated to be 0.87 [95% CI 0.76-0.93] and 0.90 [95% CI 0.74-0.97], respectively. Though AI model performance values are reported to be high, these results need to be interpreted with caution due highly biased data selection, poor quality of reporting, and limited evidence of reproducibility. Deep learning models show promise in retinal image analysis of papilloedema, however, external validation using large, diverse datasets in a variety of clinical settings is required before it can be considered a tool for triage of intracranial pathologies in resource-limited areas.