FCEGNet: Feature calibration and edge-guided MLP decoder Network for RGB-D semantic segmentation

IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yiming Lu , Bin Ge , Chenxing Xia , Xu Zhu , Mengge Zhang , Mengya Gao , Ningjie Chen , Jianjun Hu , Junjie Zhi
{"title":"FCEGNet: Feature calibration and edge-guided MLP decoder Network for RGB-D semantic segmentation","authors":"Yiming Lu ,&nbsp;Bin Ge ,&nbsp;Chenxing Xia ,&nbsp;Xu Zhu ,&nbsp;Mengge Zhang ,&nbsp;Mengya Gao ,&nbsp;Ningjie Chen ,&nbsp;Jianjun Hu ,&nbsp;Junjie Zhi","doi":"10.1016/j.cviu.2025.104448","DOIUrl":null,"url":null,"abstract":"<div><div>The references from depth image data provide rich geometric information for traditional RGB semantic segmentation, which effectively improves the performance of semantic segmentation. However, during the process of feature fusion, there are feature biases between RGB features and depth features, which negatively affect cross-modal feature fusion. In this paper, we propose a novel RGB-D network, FCEGNet, consisting of a Feature Calibration Interaction Module (FCIM), a Three-Stream Fusion Extraction Module(TFEM), and an edge-guided MLP decoder. FCIM processes features in different orientations and scales by balancing features across modalities, and exchanges spatial information to allow RGB and depth features to be calibrated and interact with cross-modal features. TFEM performs feature extraction on cross-modal features and combines them with unimodal features to improve the accuracy of enhanced semantic understanding and fine-grained recognition. Dual-stream edge guidance module (DEGM) is designed in the edge-guided MLP decoder to protect the consistency and disparity of cross-modal features while enhancing the edge information and preserving the spatial information, which helps to obtain more accurate segmentation results. Experimental results on the RGB-D dataset show that the proposed FCFGNet is superior and more efficient than several state-of-the-art methods. The generalised validation of FCEGNet on the RGB-T semantic segmentation dataset also achieves better results.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"260 ","pages":"Article 104448"},"PeriodicalIF":3.5000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225001717","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The references from depth image data provide rich geometric information for traditional RGB semantic segmentation, which effectively improves the performance of semantic segmentation. However, during the process of feature fusion, there are feature biases between RGB features and depth features, which negatively affect cross-modal feature fusion. In this paper, we propose a novel RGB-D network, FCEGNet, consisting of a Feature Calibration Interaction Module (FCIM), a Three-Stream Fusion Extraction Module(TFEM), and an edge-guided MLP decoder. FCIM processes features in different orientations and scales by balancing features across modalities, and exchanges spatial information to allow RGB and depth features to be calibrated and interact with cross-modal features. TFEM performs feature extraction on cross-modal features and combines them with unimodal features to improve the accuracy of enhanced semantic understanding and fine-grained recognition. Dual-stream edge guidance module (DEGM) is designed in the edge-guided MLP decoder to protect the consistency and disparity of cross-modal features while enhancing the edge information and preserving the spatial information, which helps to obtain more accurate segmentation results. Experimental results on the RGB-D dataset show that the proposed FCFGNet is superior and more efficient than several state-of-the-art methods. The generalised validation of FCEGNet on the RGB-T semantic segmentation dataset also achieves better results.
用于RGB-D语义分割的特征校准和边缘引导MLP解码器网络
深度图像数据的参考为传统的RGB语义分割提供了丰富的几何信息,有效地提高了语义分割的性能。然而,在特征融合过程中,RGB特征与深度特征之间存在特征偏差,对跨模态特征融合产生不利影响。在本文中,我们提出了一种新的RGB-D网络FCEGNet,该网络由特征校准交互模块(FCIM)、三流融合提取模块(TFEM)和边缘引导MLP解码器组成。FCIM通过平衡不同模态的特征来处理不同方向和尺度的特征,并交换空间信息以允许RGB和深度特征进行校准,并与跨模态特征进行交互。TFEM对跨模态特征进行特征提取,并将其与单模态特征结合,提高增强语义理解和细粒度识别的准确性。在边缘引导的MLP解码器中设计了双流边缘引导模块(DEGM),在增强边缘信息和保留空间信息的同时,保护了跨模态特征的一致性和差异性,有助于获得更准确的分割结果。在RGB-D数据集上的实验结果表明,所提出的FCFGNet比现有的几种方法更优越,效率更高。FCEGNet在RGB-T语义分割数据集上的泛化验证也取得了较好的效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer Vision and Image Understanding
Computer Vision and Image Understanding 工程技术-工程:电子与电气
CiteScore
7.80
自引率
4.40%
发文量
112
审稿时长
79 days
期刊介绍: The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信