Fatema-E- Jannat, Sina Gholami, Jennifer I. Lim, Theodore Leng, Minhaj Nur Alam, Hamed Tabkhi
{"title":"Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification","authors":"Fatema-E- Jannat, Sina Gholami, Jennifer I. Lim, Theodore Leng, Minhaj Nur Alam, Hamed Tabkhi","doi":"arxiv-2409.11375","DOIUrl":null,"url":null,"abstract":"In the medical domain, acquiring large datasets poses significant challenges\ndue to privacy concerns. Nonetheless, the development of a robust deep-learning\nmodel for retinal disease diagnosis necessitates a substantial dataset for\ntraining. The capacity to generalize effectively on smaller datasets remains a\npersistent challenge. The scarcity of data presents a significant barrier to\nthe practical implementation of scalable medical AI solutions. To address this\nissue, we've combined a wide range of data sources to improve performance and\ngeneralization to new data by giving it a deeper understanding of the data\nrepresentation from multi-modal datasets and developed a self-supervised\nframework based on large language models (LLMs), SwinV2 to gain a deeper\nunderstanding of multi-modal dataset representations, enhancing the model's\nability to extrapolate to new data for the detection of eye diseases using\noptical coherence tomography (OCT) images. We adopt a two-phase training\nmethodology, self-supervised pre-training, and fine-tuning on a downstream\nsupervised classifier. An ablation study conducted across three datasets\nemploying various encoder backbones, without data fusion, with low data\navailability setting, and without self-supervised pre-training scenarios,\nhighlights the robustness of our method. Our findings demonstrate consistent\nperformance across these diverse conditions, showcasing superior generalization\ncapabilities compared to the baseline model, ResNet-50.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11375","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the medical domain, acquiring large datasets poses significant challenges
due to privacy concerns. Nonetheless, the development of a robust deep-learning
model for retinal disease diagnosis necessitates a substantial dataset for
training. The capacity to generalize effectively on smaller datasets remains a
persistent challenge. The scarcity of data presents a significant barrier to
the practical implementation of scalable medical AI solutions. To address this
issue, we've combined a wide range of data sources to improve performance and
generalization to new data by giving it a deeper understanding of the data
representation from multi-modal datasets and developed a self-supervised
framework based on large language models (LLMs), SwinV2 to gain a deeper
understanding of multi-modal dataset representations, enhancing the model's
ability to extrapolate to new data for the detection of eye diseases using
optical coherence tomography (OCT) images. We adopt a two-phase training
methodology, self-supervised pre-training, and fine-tuning on a downstream
supervised classifier. An ablation study conducted across three datasets
employing various encoder backbones, without data fusion, with low data
availability setting, and without self-supervised pre-training scenarios,
highlights the robustness of our method. Our findings demonstrate consistent
performance across these diverse conditions, showcasing superior generalization
capabilities compared to the baseline model, ResNet-50.