Multimodal Cross-City Semantic Segmentation Based on Similarity-Inspired Fusion and Invertible Transformation Learning Network.

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems Pub Date : 2025-10-09 DOI:10.1109/tnnls.2025.3617345

Lijia Dong,Wen Jiang,Zhengyi Xu,Jie Geng

{"title":"Multimodal Cross-City Semantic Segmentation Based on Similarity-Inspired Fusion and Invertible Transformation Learning Network.","authors":"Lijia Dong,Wen Jiang,Zhengyi Xu,Jie Geng","doi":"10.1109/tnnls.2025.3617345","DOIUrl":null,"url":null,"abstract":"Multimodal cross-city semantic segmentation aims to adapt a network trained on multiple labeled source domains (MSDs) from one city to multiple unlabeled target domains (MTDs) in another city, where the multiple domains refer to different sensor modalities. However, remote sensing data from different sensors increases the extent of domain shift in the fused domain space, making feature alignment more challenging. Meanwhile, traditional fusion methods only consider complementarity within MSDs (or MTDs), which wastes cross-domain relevant information and neglects control over domain shift. To address the above issues, we propose a similarity-inspired fusion and invertible transformation learning network (SFITNet) for multimodal cross-city semantic segmentation. To alleviate the increasing alignment difficulty in multimodal fused domains, an invertible transformation learning strategy (ITLS) is proposed, which adopts a topological perspective on unsupervised domain adaptation. This strategy aims to simulate the potential distribution transformation function between the MSD and the MTD based on invertible neural networks (INNs) after feature fusion, thereby performing distribution alignment independently within the two feature spaces. A cross-domain similarity-inspired information interaction module (CDSiM) is also designed, which considers the correspondence between the MSD and the MTD in the fusion stage, effectively utilizes multimodal complementary information and promotes the subsequent alignment of fused domain shifts. The semantic segmentation tests are completed on the public C2Seg-AB dataset and a new multimodal cross-city Su-Wu dataset. Compared with some state-of-the-art techniques, the experimental results demonstrated the superiority of the proposed SFITNet.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"12 1","pages":""},"PeriodicalIF":8.9000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tnnls.2025.3617345","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Multimodal cross-city semantic segmentation aims to adapt a network trained on multiple labeled source domains (MSDs) from one city to multiple unlabeled target domains (MTDs) in another city, where the multiple domains refer to different sensor modalities. However, remote sensing data from different sensors increases the extent of domain shift in the fused domain space, making feature alignment more challenging. Meanwhile, traditional fusion methods only consider complementarity within MSDs (or MTDs), which wastes cross-domain relevant information and neglects control over domain shift. To address the above issues, we propose a similarity-inspired fusion and invertible transformation learning network (SFITNet) for multimodal cross-city semantic segmentation. To alleviate the increasing alignment difficulty in multimodal fused domains, an invertible transformation learning strategy (ITLS) is proposed, which adopts a topological perspective on unsupervised domain adaptation. This strategy aims to simulate the potential distribution transformation function between the MSD and the MTD based on invertible neural networks (INNs) after feature fusion, thereby performing distribution alignment independently within the two feature spaces. A cross-domain similarity-inspired information interaction module (CDSiM) is also designed, which considers the correspondence between the MSD and the MTD in the fusion stage, effectively utilizes multimodal complementary information and promotes the subsequent alignment of fused domain shifts. The semantic segmentation tests are completed on the public C2Seg-AB dataset and a new multimodal cross-city Su-Wu dataset. Compared with some state-of-the-art techniques, the experimental results demonstrated the superiority of the proposed SFITNet.

查看原文本刊更多论文

基于相似性启发融合和可逆转换学习网络的多模态跨城市语义分割。

多模态跨城市语义分割旨在将一个城市的多个标记源域（MSDs）训练的网络适应于另一个城市的多个未标记目标域（MTDs），其中多个域指不同的传感器模态。然而，来自不同传感器的遥感数据增加了融合域空间的域移位程度，使得特征对齐更具挑战性。同时，传统的融合方法只考虑msd（或mtd）内部的互补性，浪费了跨域相关信息，忽略了对域转移的控制。为了解决上述问题，我们提出了一种基于相似性启发的融合和可逆转换学习网络（SFITNet），用于多模态跨城市语义分割。为了缓解多模态融合域中日益增加的对齐困难，提出了一种从拓扑角度进行无监督域自适应的可逆转换学习策略。该策略旨在基于可逆神经网络（INNs）模拟特征融合后MSD和MTD之间的潜在分布转换函数，从而在两个特征空间内独立进行分布对齐。设计了跨域相似性启发信息交互模块（CDSiM），该模块在融合阶段考虑了MSD和MTD之间的对应关系，有效利用了多模态互补信息，促进了融合域位移的后续对齐。在C2Seg-AB公共数据集和新的多模态跨城市苏武数据集上完成了语义分割测试。实验结果表明，本文提出的SFITNet算法具有较好的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.