{"title":"Escaping Modal Interactions: An Efficient DESANet for Multi-Modal Object Re-Identification","authors":"Wenjiao Dong;Xi Yang;De Cheng;Nannan Wang;Xinbo Gao","doi":"10.1109/TIP.2025.3592575","DOIUrl":null,"url":null,"abstract":"Multi-modal object Re-ID aims to leverage the complementary information provided by multiple modalities to overcome challenging conditions and achieve high-quality object matching. However, existing multi-modal methods typically rely on various modality interaction modules for information fusion, which can reduce the efficiency of real-time monitoring systems. Additionally, practical challenges such as low-quality multi-modal data or missing modalities further complicate the application of object Re-ID. To address these issues, we propose the Complementary Data Enhancement and Modal-Aware Soft Alignment Network (DESANet), which is designed to be independent of interactive networks and adaptable to scenarios with missing modalities. This approach ensures a simple-yet-effective, and efficient multi-modal object Re-ID. DESANet consists of three key components: Firstly, the Dual-Color Space Data Enhancement (DCDE) module, which enhances multi-modal data by performing patch rotation in the RGB space and improving image quality in the HSV space. Secondly, the Salient Feature ReConstruction (SFRC) module, which addresses the issue of missing modalities by reconstructing features from one modality using the other two. Thirdly, the Modal-Aware Soft Alignment (MASA) module, which integrates multi-source data to avoid the blind fusion of features and prevents the propagation of noise from reconstructed modalities. Our approach achieves state-of-the-art performances on both person and vehicle datasets. Source code is available at <uri>https://github.com/DWJ11/DESANet</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5068-5083"},"PeriodicalIF":13.7000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11104996/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Multi-modal object Re-ID aims to leverage the complementary information provided by multiple modalities to overcome challenging conditions and achieve high-quality object matching. However, existing multi-modal methods typically rely on various modality interaction modules for information fusion, which can reduce the efficiency of real-time monitoring systems. Additionally, practical challenges such as low-quality multi-modal data or missing modalities further complicate the application of object Re-ID. To address these issues, we propose the Complementary Data Enhancement and Modal-Aware Soft Alignment Network (DESANet), which is designed to be independent of interactive networks and adaptable to scenarios with missing modalities. This approach ensures a simple-yet-effective, and efficient multi-modal object Re-ID. DESANet consists of three key components: Firstly, the Dual-Color Space Data Enhancement (DCDE) module, which enhances multi-modal data by performing patch rotation in the RGB space and improving image quality in the HSV space. Secondly, the Salient Feature ReConstruction (SFRC) module, which addresses the issue of missing modalities by reconstructing features from one modality using the other two. Thirdly, the Modal-Aware Soft Alignment (MASA) module, which integrates multi-source data to avoid the blind fusion of features and prevents the propagation of noise from reconstructed modalities. Our approach achieves state-of-the-art performances on both person and vehicle datasets. Source code is available at https://github.com/DWJ11/DESANet