DFF-Mono:A lightweight self-supervised monocular depth estimation method based on dual-branch feature fusion

IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Han Zhang , Xiaojun Yu , Hengrong Guo , Liang Shen , Zeming Fan
{"title":"DFF-Mono:A lightweight self-supervised monocular depth estimation method based on dual-branch feature fusion","authors":"Han Zhang ,&nbsp;Xiaojun Yu ,&nbsp;Hengrong Guo ,&nbsp;Liang Shen ,&nbsp;Zeming Fan","doi":"10.1016/j.displa.2025.103167","DOIUrl":null,"url":null,"abstract":"<div><div>Monocular depth estimation is one of the fundamental challenges in 3D scene understanding, particularly when operating within the constraints of unsupervised learning paradigms. While existing self-supervised methods avoid the dependency on annotated depth labels, their high computational complexity significantly hinders deployment on resource-constrained mobile platforms. To address this issue, we propose a parameter-efficient framework, namely, DFF-Mono, that synergistically optimizes depth estimation accuracy with computational efficiency. Specifically, the proposed DFF-Mono framework incorporates three main components. While a lightweight encoder that integrates Dual-Kernel Dilated Convolution (DKDC) modules with Dual-branch Feature Fusion (DFF) architecture is proposed for multi-scale feature encoding, a novel Attention-guided Large Kernel Inception (ALKI) module with multi-branch large-kernel convolution is devised to leverage local–global attention guidance for efficient local feature extraction. As a complement, a frequency-domain optimization strategy is also employed to enhance training efficiency. The strategy is achieved via adaptive Gaussian low-pass filtering, without introducing any additional network parameters. Extensive experiments are conducted to verify the effectiveness of the proposed method, and results demonstrate that DFF-Mono is superior over those existing approaches across standard benchmarks. Notably, DFF-Mono reduces model parameters by 23% compared to current state-of-the-art solutions while consistently achieving superior depth accuracy.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"90 ","pages":"Article 103167"},"PeriodicalIF":3.4000,"publicationDate":"2025-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225002045","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Monocular depth estimation is one of the fundamental challenges in 3D scene understanding, particularly when operating within the constraints of unsupervised learning paradigms. While existing self-supervised methods avoid the dependency on annotated depth labels, their high computational complexity significantly hinders deployment on resource-constrained mobile platforms. To address this issue, we propose a parameter-efficient framework, namely, DFF-Mono, that synergistically optimizes depth estimation accuracy with computational efficiency. Specifically, the proposed DFF-Mono framework incorporates three main components. While a lightweight encoder that integrates Dual-Kernel Dilated Convolution (DKDC) modules with Dual-branch Feature Fusion (DFF) architecture is proposed for multi-scale feature encoding, a novel Attention-guided Large Kernel Inception (ALKI) module with multi-branch large-kernel convolution is devised to leverage local–global attention guidance for efficient local feature extraction. As a complement, a frequency-domain optimization strategy is also employed to enhance training efficiency. The strategy is achieved via adaptive Gaussian low-pass filtering, without introducing any additional network parameters. Extensive experiments are conducted to verify the effectiveness of the proposed method, and results demonstrate that DFF-Mono is superior over those existing approaches across standard benchmarks. Notably, DFF-Mono reduces model parameters by 23% compared to current state-of-the-art solutions while consistently achieving superior depth accuracy.
DFF-Mono:一种基于双分支特征融合的轻量级自监督单目深度估计方法
单目深度估计是3D场景理解的基本挑战之一,特别是在无监督学习范式的约束下操作时。虽然现有的自监督方法避免了对标注深度标签的依赖,但它们的高计算复杂性严重阻碍了在资源受限的移动平台上的部署。为了解决这个问题,我们提出了一个参数高效框架,即DFF-Mono,它可以协同优化深度估计精度和计算效率。具体来说,提议的DFF-Mono框架包含三个主要组件。针对多尺度特征编码,提出了一种集成双核扩展卷积(DKDC)模块和双分支特征融合(DFF)架构的轻量级编码器,设计了一种新颖的多分支大核卷积的注意力引导大核初始化(ALKI)模块,利用局部-全局注意力引导进行高效的局部特征提取。作为补充,采用频域优化策略提高训练效率。该策略通过自适应高斯低通滤波实现,不引入任何额外的网络参数。通过大量的实验来验证所提出方法的有效性,结果表明DFF-Mono在标准基准测试中优于现有方法。值得注意的是,与目前最先进的解决方案相比,DFF-Mono减少了23%的模型参数,同时始终保持卓越的深度精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Displays
Displays 工程技术-工程:电子与电气
CiteScore
4.60
自引率
25.60%
发文量
138
审稿时长
92 days
期刊介绍: Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信