{"title":"Multimodal Cross-City Semantic Segmentation Based on Similarity-Inspired Fusion and Invertible Transformation Learning Network.","authors":"Lijia Dong,Wen Jiang,Zhengyi Xu,Jie Geng","doi":"10.1109/tnnls.2025.3617345","DOIUrl":null,"url":null,"abstract":"Multimodal cross-city semantic segmentation aims to adapt a network trained on multiple labeled source domains (MSDs) from one city to multiple unlabeled target domains (MTDs) in another city, where the multiple domains refer to different sensor modalities. However, remote sensing data from different sensors increases the extent of domain shift in the fused domain space, making feature alignment more challenging. Meanwhile, traditional fusion methods only consider complementarity within MSDs (or MTDs), which wastes cross-domain relevant information and neglects control over domain shift. To address the above issues, we propose a similarity-inspired fusion and invertible transformation learning network (SFITNet) for multimodal cross-city semantic segmentation. To alleviate the increasing alignment difficulty in multimodal fused domains, an invertible transformation learning strategy (ITLS) is proposed, which adopts a topological perspective on unsupervised domain adaptation. This strategy aims to simulate the potential distribution transformation function between the MSD and the MTD based on invertible neural networks (INNs) after feature fusion, thereby performing distribution alignment independently within the two feature spaces. A cross-domain similarity-inspired information interaction module (CDSiM) is also designed, which considers the correspondence between the MSD and the MTD in the fusion stage, effectively utilizes multimodal complementary information and promotes the subsequent alignment of fused domain shifts. The semantic segmentation tests are completed on the public C2Seg-AB dataset and a new multimodal cross-city Su-Wu dataset. Compared with some state-of-the-art techniques, the experimental results demonstrated the superiority of the proposed SFITNet.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"12 1","pages":""},"PeriodicalIF":8.9000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tnnls.2025.3617345","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Multimodal cross-city semantic segmentation aims to adapt a network trained on multiple labeled source domains (MSDs) from one city to multiple unlabeled target domains (MTDs) in another city, where the multiple domains refer to different sensor modalities. However, remote sensing data from different sensors increases the extent of domain shift in the fused domain space, making feature alignment more challenging. Meanwhile, traditional fusion methods only consider complementarity within MSDs (or MTDs), which wastes cross-domain relevant information and neglects control over domain shift. To address the above issues, we propose a similarity-inspired fusion and invertible transformation learning network (SFITNet) for multimodal cross-city semantic segmentation. To alleviate the increasing alignment difficulty in multimodal fused domains, an invertible transformation learning strategy (ITLS) is proposed, which adopts a topological perspective on unsupervised domain adaptation. This strategy aims to simulate the potential distribution transformation function between the MSD and the MTD based on invertible neural networks (INNs) after feature fusion, thereby performing distribution alignment independently within the two feature spaces. A cross-domain similarity-inspired information interaction module (CDSiM) is also designed, which considers the correspondence between the MSD and the MTD in the fusion stage, effectively utilizes multimodal complementary information and promotes the subsequent alignment of fused domain shifts. The semantic segmentation tests are completed on the public C2Seg-AB dataset and a new multimodal cross-city Su-Wu dataset. Compared with some state-of-the-art techniques, the experimental results demonstrated the superiority of the proposed SFITNet.
期刊介绍:
The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.