Qizhe Fan , Xiaoqin Shen , Shihui Ying , Juan Wang , Shaoyi Du
{"title":"基于跨模态特征再标定的RGB-D域自适应语义分割","authors":"Qizhe Fan , Xiaoqin Shen , Shihui Ying , Juan Wang , Shaoyi Du","doi":"10.1016/j.inffus.2025.103117","DOIUrl":null,"url":null,"abstract":"<div><div>Unsupervised domain adaptive (UDA) semantic segmentation aims to train models that effectively transfer knowledge from synthetic to real-world images, thereby reducing the reliance on manual annotation. Currently, most existing UDA methods primarily focus on RGB image processing, largely overlooking depth information as a valuable geometric cue that complements RGB representations. Additionally, while some approaches attempt to incorporate depth information by inferring it from RGB images as an auxiliary task, inaccuracies in depth estimation can still result in localized blurring or distortion in segmentation outcomes. To comprehensively address these limitations, we propose an innovative RGB-D UDA framework CMFRDA, which seamlessly integrates both RGB and depth images as inputs, fully leveraging their distinct yet complementary properties to improve segmentation performance. Specifically, to mitigate the prevalent object boundary noise in depth information, we propose a Depth Feature Rectification Module (DFRM), which effectively suppresses noise while enhancing the representation of fine structural details. Nevertheless, despite the effectiveness of DFRM, challenges remain due to the presence of noisy signals arising from incomplete surface data beyond the operational range of depth sensors, as well as potential mismatches between modalities. In order to overcome these challenges, we further introduce a Cross-Modality Feature Recalibration (CMFR) block. CMFR comprises two key components: Channel-wise Consistency Recalibration (CCR) and Spatial-wise Consistency Recalibration (SCR). CCR suppresses noise from incomplete surfaces in depth by leveraging the complementary information provided by RGB features, while SCR exploits the distinctive advantages of both modalities to mutually recalibrate each other, thereby ensuring consistency between RGB and depth modalities. By seamlessly integrating DFRM and CMFR, our CMFRDA framework effectively improves the performance of UDA semantic segmentation. Multitudinous experiments demonstrate that our CMFRDA achieves competitive performance on two widely-used UDA benchmarks GTA <span><math><mo>→</mo></math></span> Cityscapes and Synthia <span><math><mo>→</mo></math></span> Cityscapes.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103117"},"PeriodicalIF":14.7000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RGB-D Domain adaptive semantic segmentation with cross-modality feature recalibration\",\"authors\":\"Qizhe Fan , Xiaoqin Shen , Shihui Ying , Juan Wang , Shaoyi Du\",\"doi\":\"10.1016/j.inffus.2025.103117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Unsupervised domain adaptive (UDA) semantic segmentation aims to train models that effectively transfer knowledge from synthetic to real-world images, thereby reducing the reliance on manual annotation. Currently, most existing UDA methods primarily focus on RGB image processing, largely overlooking depth information as a valuable geometric cue that complements RGB representations. Additionally, while some approaches attempt to incorporate depth information by inferring it from RGB images as an auxiliary task, inaccuracies in depth estimation can still result in localized blurring or distortion in segmentation outcomes. To comprehensively address these limitations, we propose an innovative RGB-D UDA framework CMFRDA, which seamlessly integrates both RGB and depth images as inputs, fully leveraging their distinct yet complementary properties to improve segmentation performance. Specifically, to mitigate the prevalent object boundary noise in depth information, we propose a Depth Feature Rectification Module (DFRM), which effectively suppresses noise while enhancing the representation of fine structural details. Nevertheless, despite the effectiveness of DFRM, challenges remain due to the presence of noisy signals arising from incomplete surface data beyond the operational range of depth sensors, as well as potential mismatches between modalities. In order to overcome these challenges, we further introduce a Cross-Modality Feature Recalibration (CMFR) block. CMFR comprises two key components: Channel-wise Consistency Recalibration (CCR) and Spatial-wise Consistency Recalibration (SCR). CCR suppresses noise from incomplete surfaces in depth by leveraging the complementary information provided by RGB features, while SCR exploits the distinctive advantages of both modalities to mutually recalibrate each other, thereby ensuring consistency between RGB and depth modalities. By seamlessly integrating DFRM and CMFR, our CMFRDA framework effectively improves the performance of UDA semantic segmentation. Multitudinous experiments demonstrate that our CMFRDA achieves competitive performance on two widely-used UDA benchmarks GTA <span><math><mo>→</mo></math></span> Cityscapes and Synthia <span><math><mo>→</mo></math></span> Cityscapes.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"120 \",\"pages\":\"Article 103117\"},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2025-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525001903\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525001903","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
无监督域自适应(UDA)语义分割旨在训练有效地将知识从合成图像转移到真实图像的模型,从而减少对人工标注的依赖。目前,大多数现有的UDA方法主要集中在RGB图像处理上,很大程度上忽略了深度信息作为补充RGB表示的有价值的几何线索。此外,虽然一些方法试图通过从RGB图像中推断深度信息作为辅助任务来合并深度信息,但深度估计的不准确性仍然会导致分割结果中的局部模糊或失真。为了全面解决这些限制,我们提出了一种创新的RGB- d UDA框架CMFRDA,它将RGB和深度图像无缝集成为输入,充分利用它们独特而互补的特性来提高分割性能。具体来说,为了缓解深度信息中普遍存在的目标边界噪声,我们提出了深度特征校正模块(DFRM),该模块在有效抑制噪声的同时增强了精细结构细节的表征。然而,尽管DFRM有效,但由于深度传感器工作范围之外的不完整地表数据产生的噪声信号以及模式之间潜在的不匹配,挑战仍然存在。为了克服这些挑战,我们进一步引入了跨模态特征重新校准(CMFR)块。CMFR包括两个关键部分:通道一致性再校准(CCR)和空间一致性再校准(SCR)。CCR通过利用RGB特征提供的互补信息来抑制深度不完整表面的噪声,而SCR利用两种模式的独特优势相互重新校准,从而确保RGB和深度模式之间的一致性。通过无缝集成DFRM和CMFR,我们的CMFRDA框架有效地提高了UDA语义分割的性能。大量实验表明,我们的CMFRDA在两个广泛使用的UDA基准GTA→cityscape和Synthia→cityscape上取得了具有竞争力的性能。
RGB-D Domain adaptive semantic segmentation with cross-modality feature recalibration
Unsupervised domain adaptive (UDA) semantic segmentation aims to train models that effectively transfer knowledge from synthetic to real-world images, thereby reducing the reliance on manual annotation. Currently, most existing UDA methods primarily focus on RGB image processing, largely overlooking depth information as a valuable geometric cue that complements RGB representations. Additionally, while some approaches attempt to incorporate depth information by inferring it from RGB images as an auxiliary task, inaccuracies in depth estimation can still result in localized blurring or distortion in segmentation outcomes. To comprehensively address these limitations, we propose an innovative RGB-D UDA framework CMFRDA, which seamlessly integrates both RGB and depth images as inputs, fully leveraging their distinct yet complementary properties to improve segmentation performance. Specifically, to mitigate the prevalent object boundary noise in depth information, we propose a Depth Feature Rectification Module (DFRM), which effectively suppresses noise while enhancing the representation of fine structural details. Nevertheless, despite the effectiveness of DFRM, challenges remain due to the presence of noisy signals arising from incomplete surface data beyond the operational range of depth sensors, as well as potential mismatches between modalities. In order to overcome these challenges, we further introduce a Cross-Modality Feature Recalibration (CMFR) block. CMFR comprises two key components: Channel-wise Consistency Recalibration (CCR) and Spatial-wise Consistency Recalibration (SCR). CCR suppresses noise from incomplete surfaces in depth by leveraging the complementary information provided by RGB features, while SCR exploits the distinctive advantages of both modalities to mutually recalibrate each other, thereby ensuring consistency between RGB and depth modalities. By seamlessly integrating DFRM and CMFR, our CMFRDA framework effectively improves the performance of UDA semantic segmentation. Multitudinous experiments demonstrate that our CMFRDA achieves competitive performance on two widely-used UDA benchmarks GTA Cityscapes and Synthia Cityscapes.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.