{"title":"转义模态交互:多模态对象再识别的高效DESANet","authors":"Wenjiao Dong;Xi Yang;De Cheng;Nannan Wang;Xinbo Gao","doi":"10.1109/TIP.2025.3592575","DOIUrl":null,"url":null,"abstract":"Multi-modal object Re-ID aims to leverage the complementary information provided by multiple modalities to overcome challenging conditions and achieve high-quality object matching. However, existing multi-modal methods typically rely on various modality interaction modules for information fusion, which can reduce the efficiency of real-time monitoring systems. Additionally, practical challenges such as low-quality multi-modal data or missing modalities further complicate the application of object Re-ID. To address these issues, we propose the Complementary Data Enhancement and Modal-Aware Soft Alignment Network (DESANet), which is designed to be independent of interactive networks and adaptable to scenarios with missing modalities. This approach ensures a simple-yet-effective, and efficient multi-modal object Re-ID. DESANet consists of three key components: Firstly, the Dual-Color Space Data Enhancement (DCDE) module, which enhances multi-modal data by performing patch rotation in the RGB space and improving image quality in the HSV space. Secondly, the Salient Feature ReConstruction (SFRC) module, which addresses the issue of missing modalities by reconstructing features from one modality using the other two. Thirdly, the Modal-Aware Soft Alignment (MASA) module, which integrates multi-source data to avoid the blind fusion of features and prevents the propagation of noise from reconstructed modalities. Our approach achieves state-of-the-art performances on both person and vehicle datasets. Source code is available at <uri>https://github.com/DWJ11/DESANet</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5068-5083"},"PeriodicalIF":13.7000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Escaping Modal Interactions: An Efficient DESANet for Multi-Modal Object Re-Identification\",\"authors\":\"Wenjiao Dong;Xi Yang;De Cheng;Nannan Wang;Xinbo Gao\",\"doi\":\"10.1109/TIP.2025.3592575\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-modal object Re-ID aims to leverage the complementary information provided by multiple modalities to overcome challenging conditions and achieve high-quality object matching. However, existing multi-modal methods typically rely on various modality interaction modules for information fusion, which can reduce the efficiency of real-time monitoring systems. Additionally, practical challenges such as low-quality multi-modal data or missing modalities further complicate the application of object Re-ID. To address these issues, we propose the Complementary Data Enhancement and Modal-Aware Soft Alignment Network (DESANet), which is designed to be independent of interactive networks and adaptable to scenarios with missing modalities. This approach ensures a simple-yet-effective, and efficient multi-modal object Re-ID. DESANet consists of three key components: Firstly, the Dual-Color Space Data Enhancement (DCDE) module, which enhances multi-modal data by performing patch rotation in the RGB space and improving image quality in the HSV space. Secondly, the Salient Feature ReConstruction (SFRC) module, which addresses the issue of missing modalities by reconstructing features from one modality using the other two. Thirdly, the Modal-Aware Soft Alignment (MASA) module, which integrates multi-source data to avoid the blind fusion of features and prevents the propagation of noise from reconstructed modalities. Our approach achieves state-of-the-art performances on both person and vehicle datasets. Source code is available at <uri>https://github.com/DWJ11/DESANet</uri>\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"5068-5083\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-07-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11104996/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11104996/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Escaping Modal Interactions: An Efficient DESANet for Multi-Modal Object Re-Identification
Multi-modal object Re-ID aims to leverage the complementary information provided by multiple modalities to overcome challenging conditions and achieve high-quality object matching. However, existing multi-modal methods typically rely on various modality interaction modules for information fusion, which can reduce the efficiency of real-time monitoring systems. Additionally, practical challenges such as low-quality multi-modal data or missing modalities further complicate the application of object Re-ID. To address these issues, we propose the Complementary Data Enhancement and Modal-Aware Soft Alignment Network (DESANet), which is designed to be independent of interactive networks and adaptable to scenarios with missing modalities. This approach ensures a simple-yet-effective, and efficient multi-modal object Re-ID. DESANet consists of three key components: Firstly, the Dual-Color Space Data Enhancement (DCDE) module, which enhances multi-modal data by performing patch rotation in the RGB space and improving image quality in the HSV space. Secondly, the Salient Feature ReConstruction (SFRC) module, which addresses the issue of missing modalities by reconstructing features from one modality using the other two. Thirdly, the Modal-Aware Soft Alignment (MASA) module, which integrates multi-source data to avoid the blind fusion of features and prevents the propagation of noise from reconstructed modalities. Our approach achieves state-of-the-art performances on both person and vehicle datasets. Source code is available at https://github.com/DWJ11/DESANet