Fatema-E- Jannat, Sina Gholami, Jennifer I. Lim, Theodore Leng, Minhaj Nur Alam, Hamed Tabkhi
{"title":"Multi-OCT-SelfNet:将自我监督学习与多源数据融合相结合,增强多类视网膜疾病分类能力","authors":"Fatema-E- Jannat, Sina Gholami, Jennifer I. Lim, Theodore Leng, Minhaj Nur Alam, Hamed Tabkhi","doi":"arxiv-2409.11375","DOIUrl":null,"url":null,"abstract":"In the medical domain, acquiring large datasets poses significant challenges\ndue to privacy concerns. Nonetheless, the development of a robust deep-learning\nmodel for retinal disease diagnosis necessitates a substantial dataset for\ntraining. The capacity to generalize effectively on smaller datasets remains a\npersistent challenge. The scarcity of data presents a significant barrier to\nthe practical implementation of scalable medical AI solutions. To address this\nissue, we've combined a wide range of data sources to improve performance and\ngeneralization to new data by giving it a deeper understanding of the data\nrepresentation from multi-modal datasets and developed a self-supervised\nframework based on large language models (LLMs), SwinV2 to gain a deeper\nunderstanding of multi-modal dataset representations, enhancing the model's\nability to extrapolate to new data for the detection of eye diseases using\noptical coherence tomography (OCT) images. We adopt a two-phase training\nmethodology, self-supervised pre-training, and fine-tuning on a downstream\nsupervised classifier. An ablation study conducted across three datasets\nemploying various encoder backbones, without data fusion, with low data\navailability setting, and without self-supervised pre-training scenarios,\nhighlights the robustness of our method. Our findings demonstrate consistent\nperformance across these diverse conditions, showcasing superior generalization\ncapabilities compared to the baseline model, ResNet-50.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification\",\"authors\":\"Fatema-E- Jannat, Sina Gholami, Jennifer I. Lim, Theodore Leng, Minhaj Nur Alam, Hamed Tabkhi\",\"doi\":\"arxiv-2409.11375\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the medical domain, acquiring large datasets poses significant challenges\\ndue to privacy concerns. Nonetheless, the development of a robust deep-learning\\nmodel for retinal disease diagnosis necessitates a substantial dataset for\\ntraining. The capacity to generalize effectively on smaller datasets remains a\\npersistent challenge. The scarcity of data presents a significant barrier to\\nthe practical implementation of scalable medical AI solutions. To address this\\nissue, we've combined a wide range of data sources to improve performance and\\ngeneralization to new data by giving it a deeper understanding of the data\\nrepresentation from multi-modal datasets and developed a self-supervised\\nframework based on large language models (LLMs), SwinV2 to gain a deeper\\nunderstanding of multi-modal dataset representations, enhancing the model's\\nability to extrapolate to new data for the detection of eye diseases using\\noptical coherence tomography (OCT) images. We adopt a two-phase training\\nmethodology, self-supervised pre-training, and fine-tuning on a downstream\\nsupervised classifier. An ablation study conducted across three datasets\\nemploying various encoder backbones, without data fusion, with low data\\navailability setting, and without self-supervised pre-training scenarios,\\nhighlights the robustness of our method. Our findings demonstrate consistent\\nperformance across these diverse conditions, showcasing superior generalization\\ncapabilities compared to the baseline model, ResNet-50.\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11375\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11375","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification
In the medical domain, acquiring large datasets poses significant challenges
due to privacy concerns. Nonetheless, the development of a robust deep-learning
model for retinal disease diagnosis necessitates a substantial dataset for
training. The capacity to generalize effectively on smaller datasets remains a
persistent challenge. The scarcity of data presents a significant barrier to
the practical implementation of scalable medical AI solutions. To address this
issue, we've combined a wide range of data sources to improve performance and
generalization to new data by giving it a deeper understanding of the data
representation from multi-modal datasets and developed a self-supervised
framework based on large language models (LLMs), SwinV2 to gain a deeper
understanding of multi-modal dataset representations, enhancing the model's
ability to extrapolate to new data for the detection of eye diseases using
optical coherence tomography (OCT) images. We adopt a two-phase training
methodology, self-supervised pre-training, and fine-tuning on a downstream
supervised classifier. An ablation study conducted across three datasets
employing various encoder backbones, without data fusion, with low data
availability setting, and without self-supervised pre-training scenarios,
highlights the robustness of our method. Our findings demonstrate consistent
performance across these diverse conditions, showcasing superior generalization
capabilities compared to the baseline model, ResNet-50.