Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-17 DOI:arxiv-2409.11375

Fatema-E- Jannat, Sina Gholami, Jennifer I. Lim, Theodore Leng, Minhaj Nur Alam, Hamed Tabkhi

{"title":"Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification","authors":"Fatema-E- Jannat, Sina Gholami, Jennifer I. Lim, Theodore Leng, Minhaj Nur Alam, Hamed Tabkhi","doi":"arxiv-2409.11375","DOIUrl":null,"url":null,"abstract":"In the medical domain, acquiring large datasets poses significant challenges\ndue to privacy concerns. Nonetheless, the development of a robust deep-learning\nmodel for retinal disease diagnosis necessitates a substantial dataset for\ntraining. The capacity to generalize effectively on smaller datasets remains a\npersistent challenge. The scarcity of data presents a significant barrier to\nthe practical implementation of scalable medical AI solutions. To address this\nissue, we've combined a wide range of data sources to improve performance and\ngeneralization to new data by giving it a deeper understanding of the data\nrepresentation from multi-modal datasets and developed a self-supervised\nframework based on large language models (LLMs), SwinV2 to gain a deeper\nunderstanding of multi-modal dataset representations, enhancing the model's\nability to extrapolate to new data for the detection of eye diseases using\noptical coherence tomography (OCT) images. We adopt a two-phase training\nmethodology, self-supervised pre-training, and fine-tuning on a downstream\nsupervised classifier. An ablation study conducted across three datasets\nemploying various encoder backbones, without data fusion, with low data\navailability setting, and without self-supervised pre-training scenarios,\nhighlights the robustness of our method. Our findings demonstrate consistent\nperformance across these diverse conditions, showcasing superior generalization\ncapabilities compared to the baseline model, ResNet-50.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"188 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11375","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In the medical domain, acquiring large datasets poses significant challenges due to privacy concerns. Nonetheless, the development of a robust deep-learning model for retinal disease diagnosis necessitates a substantial dataset for training. The capacity to generalize effectively on smaller datasets remains a persistent challenge. The scarcity of data presents a significant barrier to the practical implementation of scalable medical AI solutions. To address this issue, we've combined a wide range of data sources to improve performance and generalization to new data by giving it a deeper understanding of the data representation from multi-modal datasets and developed a self-supervised framework based on large language models (LLMs), SwinV2 to gain a deeper understanding of multi-modal dataset representations, enhancing the model's ability to extrapolate to new data for the detection of eye diseases using optical coherence tomography (OCT) images. We adopt a two-phase training methodology, self-supervised pre-training, and fine-tuning on a downstream supervised classifier. An ablation study conducted across three datasets employing various encoder backbones, without data fusion, with low data availability setting, and without self-supervised pre-training scenarios, highlights the robustness of our method. Our findings demonstrate consistent performance across these diverse conditions, showcasing superior generalization capabilities compared to the baseline model, ResNet-50.

查看原文本刊更多论文

Multi-OCT-SelfNet：将自我监督学习与多源数据融合相结合，增强多类视网膜疾病分类能力

在医疗领域，由于隐私问题，获取大型数据集是一项重大挑战。然而，为视网膜疾病诊断开发强大的深度学习模型需要大量的数据集进行训练。在较小的数据集上进行有效归纳的能力仍然是一个持续的挑战。数据稀缺是实际实施可扩展医疗人工智能解决方案的重大障碍。为了解决这个问题，我们结合了广泛的数据源，通过让模型更深入地理解多模态数据集的数据表征来提高性能和对新数据的泛化能力，并开发了基于大型语言模型（LLMs）的自监督框架 SwinV2，以深入理解多模态数据集的表征，增强模型对新数据的推断能力，从而利用光学相干断层扫描（OCT）图像检测眼部疾病。我们采用了两阶段训练方法，即自我监督预训练和下游监督分类器微调。我们在三个数据集上进行了消融研究，这三个数据集采用了不同的编码器骨干，包括无数据融合、低数据可用性设置和无自我监督预训练的情况，从而凸显了我们方法的鲁棒性。我们的研究结果表明，与基线模型 ResNet-50 相比，我们的方法在这些不同的条件下都表现出了一致的性能，展示了卓越的泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Computer Vision and Pattern Recognition

自引率

0.00%

发文量