Qizhe Fan , Xiaoqin Shen , Shihui Ying , Juan Wang , Shaoyi Du
{"title":"RGB-D Domain adaptive semantic segmentation with cross-modality feature recalibration","authors":"Qizhe Fan , Xiaoqin Shen , Shihui Ying , Juan Wang , Shaoyi Du","doi":"10.1016/j.inffus.2025.103117","DOIUrl":null,"url":null,"abstract":"<div><div>Unsupervised domain adaptive (UDA) semantic segmentation aims to train models that effectively transfer knowledge from synthetic to real-world images, thereby reducing the reliance on manual annotation. Currently, most existing UDA methods primarily focus on RGB image processing, largely overlooking depth information as a valuable geometric cue that complements RGB representations. Additionally, while some approaches attempt to incorporate depth information by inferring it from RGB images as an auxiliary task, inaccuracies in depth estimation can still result in localized blurring or distortion in segmentation outcomes. To comprehensively address these limitations, we propose an innovative RGB-D UDA framework CMFRDA, which seamlessly integrates both RGB and depth images as inputs, fully leveraging their distinct yet complementary properties to improve segmentation performance. Specifically, to mitigate the prevalent object boundary noise in depth information, we propose a Depth Feature Rectification Module (DFRM), which effectively suppresses noise while enhancing the representation of fine structural details. Nevertheless, despite the effectiveness of DFRM, challenges remain due to the presence of noisy signals arising from incomplete surface data beyond the operational range of depth sensors, as well as potential mismatches between modalities. In order to overcome these challenges, we further introduce a Cross-Modality Feature Recalibration (CMFR) block. CMFR comprises two key components: Channel-wise Consistency Recalibration (CCR) and Spatial-wise Consistency Recalibration (SCR). CCR suppresses noise from incomplete surfaces in depth by leveraging the complementary information provided by RGB features, while SCR exploits the distinctive advantages of both modalities to mutually recalibrate each other, thereby ensuring consistency between RGB and depth modalities. By seamlessly integrating DFRM and CMFR, our CMFRDA framework effectively improves the performance of UDA semantic segmentation. Multitudinous experiments demonstrate that our CMFRDA achieves competitive performance on two widely-used UDA benchmarks GTA <span><math><mo>→</mo></math></span> Cityscapes and Synthia <span><math><mo>→</mo></math></span> Cityscapes.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103117"},"PeriodicalIF":14.7000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525001903","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Unsupervised domain adaptive (UDA) semantic segmentation aims to train models that effectively transfer knowledge from synthetic to real-world images, thereby reducing the reliance on manual annotation. Currently, most existing UDA methods primarily focus on RGB image processing, largely overlooking depth information as a valuable geometric cue that complements RGB representations. Additionally, while some approaches attempt to incorporate depth information by inferring it from RGB images as an auxiliary task, inaccuracies in depth estimation can still result in localized blurring or distortion in segmentation outcomes. To comprehensively address these limitations, we propose an innovative RGB-D UDA framework CMFRDA, which seamlessly integrates both RGB and depth images as inputs, fully leveraging their distinct yet complementary properties to improve segmentation performance. Specifically, to mitigate the prevalent object boundary noise in depth information, we propose a Depth Feature Rectification Module (DFRM), which effectively suppresses noise while enhancing the representation of fine structural details. Nevertheless, despite the effectiveness of DFRM, challenges remain due to the presence of noisy signals arising from incomplete surface data beyond the operational range of depth sensors, as well as potential mismatches between modalities. In order to overcome these challenges, we further introduce a Cross-Modality Feature Recalibration (CMFR) block. CMFR comprises two key components: Channel-wise Consistency Recalibration (CCR) and Spatial-wise Consistency Recalibration (SCR). CCR suppresses noise from incomplete surfaces in depth by leveraging the complementary information provided by RGB features, while SCR exploits the distinctive advantages of both modalities to mutually recalibrate each other, thereby ensuring consistency between RGB and depth modalities. By seamlessly integrating DFRM and CMFR, our CMFRDA framework effectively improves the performance of UDA semantic segmentation. Multitudinous experiments demonstrate that our CMFRDA achieves competitive performance on two widely-used UDA benchmarks GTA Cityscapes and Synthia Cityscapes.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.