Bridging the Modality Gap in Multimodal Eye Disease Screening: Learning Modality Shared-Specific Features via Multi-Level Regularization

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters Pub Date : 2025-01-06 DOI:10.1109/LSP.2025.3526094

Jiayue Zhao;Shiman Li;Yi Hao;Chenxi Zhang

{"title":"Bridging the Modality Gap in Multimodal Eye Disease Screening: Learning Modality Shared-Specific Features via Multi-Level Regularization","authors":"Jiayue Zhao;Shiman Li;Yi Hao;Chenxi Zhang","doi":"10.1109/LSP.2025.3526094","DOIUrl":null,"url":null,"abstract":"Color fundus photography (CFP) and optical coherence tomography (OCT) are two common modalities used in eye disease screening, providing crucial complementary information for the diagnosis of eye diseases. However, existing multimodal learning methods cannot fully leverage the information from each modality due to the large dimensional and semantic gap between 2D CFP and 3D OCT images, leading to suboptimal classification performance. To bridge the modality gap and fully exploit the information from each modality, we propose a novel feature disentanglement method that decomposes features into modality-shared and modality-specific components. We design a multi-level regularization strategy including intra-modality, inter-modality, and intra-inter-modality regularization to facilitate the effective learning of the modality Shared-Specific features. Our method achieves state-of-the-art performance on two eye disease diagnosis tasks using two publicly available datasets. Our method promises to serve as a useful tool for multimodal eye disease diagnosis.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"586-590"},"PeriodicalIF":3.2000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10824970/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Color fundus photography (CFP) and optical coherence tomography (OCT) are two common modalities used in eye disease screening, providing crucial complementary information for the diagnosis of eye diseases. However, existing multimodal learning methods cannot fully leverage the information from each modality due to the large dimensional and semantic gap between 2D CFP and 3D OCT images, leading to suboptimal classification performance. To bridge the modality gap and fully exploit the information from each modality, we propose a novel feature disentanglement method that decomposes features into modality-shared and modality-specific components. We design a multi-level regularization strategy including intra-modality, inter-modality, and intra-inter-modality regularization to facilitate the effective learning of the modality Shared-Specific features. Our method achieves state-of-the-art performance on two eye disease diagnosis tasks using two publicly available datasets. Our method promises to serve as a useful tool for multimodal eye disease diagnosis.

查看原文本刊更多论文

弥合多模态眼病筛查的模态差距：通过多层次正则化学习模态共享特异性特征

彩色眼底摄影（CFP）和光学相干断层扫描（OCT）是眼病筛查中常用的两种方式，为眼病的诊断提供了重要的补充信息。然而，由于二维CFP和三维OCT图像之间存在较大的维度和语义差距，现有的多模态学习方法无法充分利用每个模态的信息，导致分类性能不理想。为了消除模态差异并充分利用每种模态的信息，我们提出了一种新的特征解纠缠方法，将特征分解为模态共享组件和模态特定组件。我们设计了一种多级正则化策略，包括模态内、模态间和模态间的正则化，以促进模态共享特定特征的有效学习。我们的方法使用两个公开可用的数据集在两个眼病诊断任务上实现了最先进的性能。该方法有望成为多模态眼病诊断的有效工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Signal Processing Letters 工程技术-工程：电子与电气

CiteScore

7.40

自引率

12.80%

发文量

339

审稿时长

2.8 months

期刊介绍： The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.