轻量级自监督单目深度估计的双路径注意网络

IF 4.3 2区综合性期刊 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Sensors Journal Pub Date : 2025-08-29 DOI:10.1109/JSEN.2025.3601212

Chao Zhang;Tian Tian;Cheng Han;Tiancheng Shao;Mi Zhou;Shichao Zhao

{"title":"轻量级自监督单目深度估计的双路径注意网络","authors":"Chao Zhang;Tian Tian;Cheng Han;Tiancheng Shao;Mi Zhou;Shichao Zhao","doi":"10.1109/JSEN.2025.3601212","DOIUrl":null,"url":null,"abstract":"Self-supervised monocular depth estimation realizes training without depth labeling data by mining the geometric consistency of image sequences, which has important application value in fields, such as autonomous driving. Traditional methods rely on complex CNN and transformer hybrid architectures to balance local and global features but face problems, such as a large number of model parameters and low computational efficiency, which severely limit the deployment capability of edge devices. Although the existing lightweight methods reduce the number of parameters through techniques, such as depth-separable convolution and channel compression, there are still have problems, such as insufficient multiscale feature fusion, limited interaction ability of global and local context information, and loss of details at the edge of the depth map. To solve these problems, we propose LM-DualNet, a novel architecture with dual-path attention enhancement. Specifically, the encoder integrates a dynamic local context-aware (DLCA) module for capturing fine-grained local structures, and a dual-axis gated attention (DAGA) module that constructs two parallel attention paths-spatial and channel-to jointly model positional dependencies and cross-channel correlations. In the decoder, we design a multiscale depth enhancement (MSDE) module to refine edge regions and enhance depth continuity. Experiments on the KITTI dataset show that the absolute relative error and squared relative error of LM-DualNet have decreased to 0.106 and 0.731, respectively, and the accuracy has reached 88.8%, which is a good improvement compared with other state-of-the-art algorithms.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 19","pages":"37419-37428"},"PeriodicalIF":4.3000,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dual-Path Attention Network for Lightweight Self-Supervised Monocular Depth Estimation\",\"authors\":\"Chao Zhang;Tian Tian;Cheng Han;Tiancheng Shao;Mi Zhou;Shichao Zhao\",\"doi\":\"10.1109/JSEN.2025.3601212\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Self-supervised monocular depth estimation realizes training without depth labeling data by mining the geometric consistency of image sequences, which has important application value in fields, such as autonomous driving. Traditional methods rely on complex CNN and transformer hybrid architectures to balance local and global features but face problems, such as a large number of model parameters and low computational efficiency, which severely limit the deployment capability of edge devices. Although the existing lightweight methods reduce the number of parameters through techniques, such as depth-separable convolution and channel compression, there are still have problems, such as insufficient multiscale feature fusion, limited interaction ability of global and local context information, and loss of details at the edge of the depth map. To solve these problems, we propose LM-DualNet, a novel architecture with dual-path attention enhancement. Specifically, the encoder integrates a dynamic local context-aware (DLCA) module for capturing fine-grained local structures, and a dual-axis gated attention (DAGA) module that constructs two parallel attention paths-spatial and channel-to jointly model positional dependencies and cross-channel correlations. In the decoder, we design a multiscale depth enhancement (MSDE) module to refine edge regions and enhance depth continuity. Experiments on the KITTI dataset show that the absolute relative error and squared relative error of LM-DualNet have decreased to 0.106 and 0.731, respectively, and the accuracy has reached 88.8%, which is a good improvement compared with other state-of-the-art algorithms.\",\"PeriodicalId\":447,\"journal\":{\"name\":\"IEEE Sensors Journal\",\"volume\":\"25 19\",\"pages\":\"37419-37428\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Sensors Journal\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11145268/\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/11145268/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

自监督单目深度估计通过挖掘图像序列的几何一致性，实现了不需要深度标注数据的训练，在自动驾驶等领域具有重要的应用价值。传统方法依靠复杂的CNN和变压器混合架构来平衡局部和全局特征，但存在模型参数多、计算效率低等问题，严重限制了边缘设备的部署能力。虽然现有的轻量化方法通过深度可分卷积和通道压缩等技术减少了参数的数量，但仍然存在多尺度特征融合不足、全局和局部上下文信息交互能力有限、深度图边缘细节丢失等问题。为了解决这些问题，我们提出了一种具有双路径注意力增强的新架构LM-DualNet。具体来说，编码器集成了一个动态本地上下文感知（DLCA）模块，用于捕获细粒度的本地结构，以及一个双轴门控注意（DAGA）模块，该模块构建了两条平行的注意路径——空间和通道——来联合建模位置依赖性和跨通道相关性。在解码器中，我们设计了一个多尺度深度增强（MSDE）模块来细化边缘区域，增强深度连续性。在KITTI数据集上的实验表明，LM-DualNet的绝对相对误差和平方相对误差分别下降到0.106和0.731，准确率达到了88.8%，与其他先进算法相比有了很好的提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dual-Path Attention Network for Lightweight Self-Supervised Monocular Depth Estimation

Self-supervised monocular depth estimation realizes training without depth labeling data by mining the geometric consistency of image sequences, which has important application value in fields, such as autonomous driving. Traditional methods rely on complex CNN and transformer hybrid architectures to balance local and global features but face problems, such as a large number of model parameters and low computational efficiency, which severely limit the deployment capability of edge devices. Although the existing lightweight methods reduce the number of parameters through techniques, such as depth-separable convolution and channel compression, there are still have problems, such as insufficient multiscale feature fusion, limited interaction ability of global and local context information, and loss of details at the edge of the depth map. To solve these problems, we propose LM-DualNet, a novel architecture with dual-path attention enhancement. Specifically, the encoder integrates a dynamic local context-aware (DLCA) module for capturing fine-grained local structures, and a dual-axis gated attention (DAGA) module that constructs two parallel attention paths-spatial and channel-to jointly model positional dependencies and cross-channel correlations. In the decoder, we design a multiscale depth enhancement (MSDE) module to refine edge regions and enhance depth continuity. Experiments on the KITTI dataset show that the absolute relative error and squared relative error of LM-DualNet have decreased to 0.106 and 0.731, respectively, and the accuracy has reached 88.8%, which is a good improvement compared with other state-of-the-art algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Sensors Journal 工程技术-工程：电子与电气

CiteScore

7.70

自引率

14.00%

发文量

2058

审稿时长

5.2 months

期刊介绍： The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following: -Sensor Phenomenology, Modelling, and Evaluation -Sensor Materials, Processing, and Fabrication -Chemical and Gas Sensors -Microfluidics and Biosensors -Optical Sensors -Physical Sensors: Temperature, Mechanical, Magnetic, and others -Acoustic and Ultrasonic Sensors -Sensor Packaging -Sensor Networks -Sensor Applications -Sensor Systems: Signals, Processing, and Interfaces -Actuators and Sensor Power Systems -Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting -Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data) -Sensors in Industrial Practice