GRdepth: Enrich feature with global information and self-iterative regulation network for monocular depth estimation

IF 2.9 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Digital Signal Processing Pub Date : 2025-07-01 DOI:10.1016/j.dsp.2025.105434

Chenxing Xia , Aoqi Zhang , Xiuju Gao , Bin Ge , Kuan-Ching Li , Xianjin Fang , Xingzhu Liang , Yan Zhang

{"title":"GRdepth: Enrich feature with global information and self-iterative regulation network for monocular depth estimation","authors":"Chenxing Xia , Aoqi Zhang , Xiuju Gao , Bin Ge , Kuan-Ching Li , Xianjin Fang , Xingzhu Liang , Yan Zhang","doi":"10.1016/j.dsp.2025.105434","DOIUrl":null,"url":null,"abstract":"<div><div>Monocular depth estimation (MDE) seeks to infer pixel-wise dense depth maps from a single RGB image. Recent methodologies predominantly utilize the encoder-decoder architecture to effectively extract and analyze multi-scale features. However, they tend to ignore the important role that high-level features with rich global information play in MDE, resulting in a poor understanding of the overall structure of the scene by the model. Based on this, we propose a novel encoder-decoder framework called GRdepth, which includes a cross large scale feature enhancement (CLSE) module and an iterative regulation decoder (IRD). Specifically, the CLSE module is designed to use high-level features, enriched with global information extracted by a global information aggregation (GIA) unit, to guide the enhancement of multi-scale feature maps produced by the encoder. This enhancement is achieved through a cross large scale feature fusion (CLSF) unit built from channel attention and spatial attention to refine low-level features with high-level information. The IRD is tailored for MDE based on classification-regression which mainly utilizes a bin width self-regulation (SRbins) unit to adjust the width of the initial bins predicted with the bottleneck features. This adjustment is guided by bin width predicted by an iterative adaptive feature fusion (IAFF) unit at each level, effectively combining global information and local information for more accurate bin width and bin centers. Extensive experiments on the indoor dataset NYU-Depth-v2 and SUN-RGBD and on the outdoor dataset KITTI demonstrate that our method can achieve comparable state-of-the-art (SOTA) results.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105434"},"PeriodicalIF":2.9000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200425004567","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Monocular depth estimation (MDE) seeks to infer pixel-wise dense depth maps from a single RGB image. Recent methodologies predominantly utilize the encoder-decoder architecture to effectively extract and analyze multi-scale features. However, they tend to ignore the important role that high-level features with rich global information play in MDE, resulting in a poor understanding of the overall structure of the scene by the model. Based on this, we propose a novel encoder-decoder framework called GRdepth, which includes a cross large scale feature enhancement (CLSE) module and an iterative regulation decoder (IRD). Specifically, the CLSE module is designed to use high-level features, enriched with global information extracted by a global information aggregation (GIA) unit, to guide the enhancement of multi-scale feature maps produced by the encoder. This enhancement is achieved through a cross large scale feature fusion (CLSF) unit built from channel attention and spatial attention to refine low-level features with high-level information. The IRD is tailored for MDE based on classification-regression which mainly utilizes a bin width self-regulation (SRbins) unit to adjust the width of the initial bins predicted with the bottleneck features. This adjustment is guided by bin width predicted by an iterative adaptive feature fusion (IAFF) unit at each level, effectively combining global information and local information for more accurate bin width and bin centers. Extensive experiments on the indoor dataset NYU-Depth-v2 and SUN-RGBD and on the outdoor dataset KITTI demonstrate that our method can achieve comparable state-of-the-art (SOTA) results.

查看原文本刊更多论文

GRdepth：利用全局信息和自迭代调节网络丰富特征，实现单目深度估计

单目深度估计（MDE）旨在从单个RGB图像推断像素密集深度图。最近的方法主要利用编码器-解码器架构来有效地提取和分析多尺度特征。然而，它们往往忽略了具有丰富全局信息的高级特征在MDE中的重要作用，导致模型对场景的整体结构理解较差。基于此，我们提出了一种新的编码器-解码器框架GRdepth，该框架包括一个跨大规模特征增强（CLSE）模块和一个迭代调节解码器（IRD）。具体来说，CLSE模块旨在使用高级特征，并通过全局信息聚合（GIA）单元提取丰富的全局信息，来指导编码器生成的多尺度特征图的增强。这种增强是通过通道注意和空间注意构建的跨大规模特征融合（CLSF）单元来实现的，以高级信息细化低级特征。IRD是为MDE量身定制的基于分类回归的模型，它主要利用桶宽度自调节（SRbins）单元来调整瓶颈特征预测的初始桶的宽度。这种调整以每一层迭代自适应特征融合（IAFF）单元预测的桶宽度为指导，有效地将全局信息和局部信息结合起来，以获得更准确的桶宽度和桶中心。在室内数据集nyu - deep -v2和SUN-RGBD以及室外数据集KITTI上进行的大量实验表明，我们的方法可以获得可比较的最先进（SOTA）结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital Signal Processing 工程技术-工程：电子与电气

CiteScore

5.30

自引率

17.20%

发文量

435

审稿时长

66 days

期刊介绍： Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal. The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as: • big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,