Shahrzad Gholami PhD , Lea Scheppke PhD , Meghana Kshirsagar PhD , Yue Wu PhD , Rahul Dodhia PhD , Roberto Bonelli PhD , Irene Leung BA , Ferenc B. Sallo MD, PhD , Alyson Muldrew PhD , Catherine Jamison MSc , Tunde Peto MD, PhD , Juan Lavista Ferres PhD , William B. Weeks MD, PhD , Martin Friedlander MD, PhD , Aaron Y. Lee MD, MSCI , Lowy Medical Research Institute
{"title":"Enhanced Macular Telangiectasia Type 2 Detection: Leveraging Self-Supervised Learning and Ensemble Models","authors":"Shahrzad Gholami PhD , Lea Scheppke PhD , Meghana Kshirsagar PhD , Yue Wu PhD , Rahul Dodhia PhD , Roberto Bonelli PhD , Irene Leung BA , Ferenc B. Sallo MD, PhD , Alyson Muldrew PhD , Catherine Jamison MSc , Tunde Peto MD, PhD , Juan Lavista Ferres PhD , William B. Weeks MD, PhD , Martin Friedlander MD, PhD , Aaron Y. Lee MD, MSCI , Lowy Medical Research Institute","doi":"10.1016/j.xops.2025.100710","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>To investigate an ensemble-based approach utilizing deep learning models for accurate and interpretable detection of macular telangiectasia (MacTel) type 2 on OCT imaging.</div></div><div><h3>Design</h3><div>Retrospective analysis of OCT scans, model development, and assessment.</div></div><div><h3>Participants</h3><div>A total of 5200 OCT images from participants in the MacTel Registry conducted by the Lowy Medical Research Institute and from the University of Washington (780 MacTel patients and 1900 non-MacTel patients).</div></div><div><h3>Methods, Intervention, or Testing</h3><div>We trained multiple individual MacTel vs. non-MacTel classification models using traditional supervised learning and self-supervised learning (SSL) and ensembled them using average weighting methods. We investigated diverse methodologies for constructing the ensemble, including varied architectural configurations and learning paradigms of individual models, and manipulating the amount of labeled data accessible for training. Model performance was compared against human expert graders on held-out test set data. Model interpretability was investigated using gradient-weighted class activation maps (Grad-CAM) visualization and by evaluating interrater agreement.</div></div><div><h3>Main Outcome Measures</h3><div>For model performance, area under the receiver operating characteristic curve (AUROC), area under the precision–recall curve (AUPRC), accuracy, sensitivity, and specificity were reported. For interpretability, interrater agreements and Grad-CAM visualization results were evaluated.</div></div><div><h3>Results</h3><div>Despite access to only 419 OCT volumes, including 185 MacTel patients within the 10% labeled training dataset, the ensemble model demonstrated a performance level (AUROC 0.972 [95% confidence interval (CI), 0.971–0.973], AUPRC 0.967 [95% CI, 0.965–0.969], accuracy 91.7%, sensitivity 0.905, and specificity 0.925) comparable to the human experts ensemble (AUROC 0.977 [95% CI, 0.975–0.978], AUPRC 0.987 [95% CI, 0.986–0.987], accuracy 96.8%, sensitivity 0.929, and specificity 1) on a test set of 500 patients. The individual models did not achieve the same performance levels when evaluated separately.</div></div><div><h3>Conclusions</h3><div>Even with limited data, combining SSL with ensemble approaches improved MacTel classification accuracy and interpretation compared to the individual models. Self-supervised learning captures meaningful representations from unlabeled data, a key benefit in the setting of limited data such as with rare diseases.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"5 4","pages":"Article 100710"},"PeriodicalIF":3.2000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmology science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666914525000089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
To investigate an ensemble-based approach utilizing deep learning models for accurate and interpretable detection of macular telangiectasia (MacTel) type 2 on OCT imaging.
Design
Retrospective analysis of OCT scans, model development, and assessment.
Participants
A total of 5200 OCT images from participants in the MacTel Registry conducted by the Lowy Medical Research Institute and from the University of Washington (780 MacTel patients and 1900 non-MacTel patients).
Methods, Intervention, or Testing
We trained multiple individual MacTel vs. non-MacTel classification models using traditional supervised learning and self-supervised learning (SSL) and ensembled them using average weighting methods. We investigated diverse methodologies for constructing the ensemble, including varied architectural configurations and learning paradigms of individual models, and manipulating the amount of labeled data accessible for training. Model performance was compared against human expert graders on held-out test set data. Model interpretability was investigated using gradient-weighted class activation maps (Grad-CAM) visualization and by evaluating interrater agreement.
Main Outcome Measures
For model performance, area under the receiver operating characteristic curve (AUROC), area under the precision–recall curve (AUPRC), accuracy, sensitivity, and specificity were reported. For interpretability, interrater agreements and Grad-CAM visualization results were evaluated.
Results
Despite access to only 419 OCT volumes, including 185 MacTel patients within the 10% labeled training dataset, the ensemble model demonstrated a performance level (AUROC 0.972 [95% confidence interval (CI), 0.971–0.973], AUPRC 0.967 [95% CI, 0.965–0.969], accuracy 91.7%, sensitivity 0.905, and specificity 0.925) comparable to the human experts ensemble (AUROC 0.977 [95% CI, 0.975–0.978], AUPRC 0.987 [95% CI, 0.986–0.987], accuracy 96.8%, sensitivity 0.929, and specificity 1) on a test set of 500 patients. The individual models did not achieve the same performance levels when evaluated separately.
Conclusions
Even with limited data, combining SSL with ensemble approaches improved MacTel classification accuracy and interpretation compared to the individual models. Self-supervised learning captures meaningful representations from unlabeled data, a key benefit in the setting of limited data such as with rare diseases.
Financial Disclosure(s)
Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.