{"title":"轻量级自监督单目深度估计的双路径注意网络","authors":"Chao Zhang;Tian Tian;Cheng Han;Tiancheng Shao;Mi Zhou;Shichao Zhao","doi":"10.1109/JSEN.2025.3601212","DOIUrl":null,"url":null,"abstract":"Self-supervised monocular depth estimation realizes training without depth labeling data by mining the geometric consistency of image sequences, which has important application value in fields, such as autonomous driving. Traditional methods rely on complex CNN and transformer hybrid architectures to balance local and global features but face problems, such as a large number of model parameters and low computational efficiency, which severely limit the deployment capability of edge devices. Although the existing lightweight methods reduce the number of parameters through techniques, such as depth-separable convolution and channel compression, there are still have problems, such as insufficient multiscale feature fusion, limited interaction ability of global and local context information, and loss of details at the edge of the depth map. To solve these problems, we propose LM-DualNet, a novel architecture with dual-path attention enhancement. Specifically, the encoder integrates a dynamic local context-aware (DLCA) module for capturing fine-grained local structures, and a dual-axis gated attention (DAGA) module that constructs two parallel attention paths-spatial and channel-to jointly model positional dependencies and cross-channel correlations. In the decoder, we design a multiscale depth enhancement (MSDE) module to refine edge regions and enhance depth continuity. Experiments on the KITTI dataset show that the absolute relative error and squared relative error of LM-DualNet have decreased to 0.106 and 0.731, respectively, and the accuracy has reached 88.8%, which is a good improvement compared with other state-of-the-art algorithms.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 19","pages":"37419-37428"},"PeriodicalIF":4.3000,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dual-Path Attention Network for Lightweight Self-Supervised Monocular Depth Estimation\",\"authors\":\"Chao Zhang;Tian Tian;Cheng Han;Tiancheng Shao;Mi Zhou;Shichao Zhao\",\"doi\":\"10.1109/JSEN.2025.3601212\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Self-supervised monocular depth estimation realizes training without depth labeling data by mining the geometric consistency of image sequences, which has important application value in fields, such as autonomous driving. Traditional methods rely on complex CNN and transformer hybrid architectures to balance local and global features but face problems, such as a large number of model parameters and low computational efficiency, which severely limit the deployment capability of edge devices. Although the existing lightweight methods reduce the number of parameters through techniques, such as depth-separable convolution and channel compression, there are still have problems, such as insufficient multiscale feature fusion, limited interaction ability of global and local context information, and loss of details at the edge of the depth map. To solve these problems, we propose LM-DualNet, a novel architecture with dual-path attention enhancement. Specifically, the encoder integrates a dynamic local context-aware (DLCA) module for capturing fine-grained local structures, and a dual-axis gated attention (DAGA) module that constructs two parallel attention paths-spatial and channel-to jointly model positional dependencies and cross-channel correlations. In the decoder, we design a multiscale depth enhancement (MSDE) module to refine edge regions and enhance depth continuity. Experiments on the KITTI dataset show that the absolute relative error and squared relative error of LM-DualNet have decreased to 0.106 and 0.731, respectively, and the accuracy has reached 88.8%, which is a good improvement compared with other state-of-the-art algorithms.\",\"PeriodicalId\":447,\"journal\":{\"name\":\"IEEE Sensors Journal\",\"volume\":\"25 19\",\"pages\":\"37419-37428\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Sensors Journal\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11145268/\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/11145268/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Dual-Path Attention Network for Lightweight Self-Supervised Monocular Depth Estimation
Self-supervised monocular depth estimation realizes training without depth labeling data by mining the geometric consistency of image sequences, which has important application value in fields, such as autonomous driving. Traditional methods rely on complex CNN and transformer hybrid architectures to balance local and global features but face problems, such as a large number of model parameters and low computational efficiency, which severely limit the deployment capability of edge devices. Although the existing lightweight methods reduce the number of parameters through techniques, such as depth-separable convolution and channel compression, there are still have problems, such as insufficient multiscale feature fusion, limited interaction ability of global and local context information, and loss of details at the edge of the depth map. To solve these problems, we propose LM-DualNet, a novel architecture with dual-path attention enhancement. Specifically, the encoder integrates a dynamic local context-aware (DLCA) module for capturing fine-grained local structures, and a dual-axis gated attention (DAGA) module that constructs two parallel attention paths-spatial and channel-to jointly model positional dependencies and cross-channel correlations. In the decoder, we design a multiscale depth enhancement (MSDE) module to refine edge regions and enhance depth continuity. Experiments on the KITTI dataset show that the absolute relative error and squared relative error of LM-DualNet have decreased to 0.106 and 0.731, respectively, and the accuracy has reached 88.8%, which is a good improvement compared with other state-of-the-art algorithms.
期刊介绍:
The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following:
-Sensor Phenomenology, Modelling, and Evaluation
-Sensor Materials, Processing, and Fabrication
-Chemical and Gas Sensors
-Microfluidics and Biosensors
-Optical Sensors
-Physical Sensors: Temperature, Mechanical, Magnetic, and others
-Acoustic and Ultrasonic Sensors
-Sensor Packaging
-Sensor Networks
-Sensor Applications
-Sensor Systems: Signals, Processing, and Interfaces
-Actuators and Sensor Power Systems
-Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting
-Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data)
-Sensors in Industrial Practice