PRDepth：基于姿态细化增强的室内场景单目深度估计

IF 5.6 2区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Instrumentation and Measurement Pub Date : 2025-04-23 DOI:10.1109/TIM.2025.3562976

Chenggong Han;Chen Lv;Xiaolin Huang;Qiqi Kou;Deqiang Cheng;He Jiang

{"title":"PRDepth：基于姿态细化增强的室内场景单目深度估计","authors":"Chenggong Han;Chen Lv;Xiaolin Huang;Qiqi Kou;Deqiang Cheng;He Jiang","doi":"10.1109/TIM.2025.3562976","DOIUrl":null,"url":null,"abstract":"Indoor depth measurement is widely used in technologies such as virtual reality and augmented reality. However, indoor scenes are typically captured with handheld cameras, resulting in more complex and unpredictable variations between frames. Self-supervised depth estimation relies on frame-to-frame projection for self-constraint, and inaccurate pose predictions between frames significantly hinder depth estimation in indoor environments. To address this issue, PRDepth is proposed, a self-supervised pose refinement method tailored for indoor environments. PRDepth introduces a pose reconstruction iterative module (PRIM) that refines multiframe pose decomposition and reconstruction. By leveraging contextual information from intermediate frames, it mitigates estimation errors caused by large rotations and reduces training errors due to inaccurate rotations, leading to more precise pose predictions between frames. Additionally, to enhance the information exchange and context integration capabilities of the depth network, PRDepth features a depth-weighted incentive module, which includes a global depth enhancement module (GDEM) in the encoder-decoder and a weight-adaptive incentive module (WAIM) in the decoder. The GDEM improves the network’s ability to extract depth information in complex scenes by interacting with global cross-dimensional data. An attention-guided mechanism is adopted by the WAIM to aggregate multiscale feature information and assign adaptive weights to different features, ensuring efficient global context fusion and suppression of redundant information. Experimental results demonstrate that our method significantly outperforms existing state-of-the-art self-supervised monocular depth estimation techniques for indoor scenes. Extensive ablation studies are conducted on each module of PRDepth. PRDepth demonstrates precise depth estimation and robust generalization across indoor datasets, including NYUv2, 7-Scenes, ScanNet, and InteriorNet.","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":"74 ","pages":"1-16"},"PeriodicalIF":5.6000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PRDepth: Pose Refinement Enhancement-Based Monocular Depth Estimation for Indoor Scenes\",\"authors\":\"Chenggong Han;Chen Lv;Xiaolin Huang;Qiqi Kou;Deqiang Cheng;He Jiang\",\"doi\":\"10.1109/TIM.2025.3562976\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Indoor depth measurement is widely used in technologies such as virtual reality and augmented reality. However, indoor scenes are typically captured with handheld cameras, resulting in more complex and unpredictable variations between frames. Self-supervised depth estimation relies on frame-to-frame projection for self-constraint, and inaccurate pose predictions between frames significantly hinder depth estimation in indoor environments. To address this issue, PRDepth is proposed, a self-supervised pose refinement method tailored for indoor environments. PRDepth introduces a pose reconstruction iterative module (PRIM) that refines multiframe pose decomposition and reconstruction. By leveraging contextual information from intermediate frames, it mitigates estimation errors caused by large rotations and reduces training errors due to inaccurate rotations, leading to more precise pose predictions between frames. Additionally, to enhance the information exchange and context integration capabilities of the depth network, PRDepth features a depth-weighted incentive module, which includes a global depth enhancement module (GDEM) in the encoder-decoder and a weight-adaptive incentive module (WAIM) in the decoder. The GDEM improves the network’s ability to extract depth information in complex scenes by interacting with global cross-dimensional data. An attention-guided mechanism is adopted by the WAIM to aggregate multiscale feature information and assign adaptive weights to different features, ensuring efficient global context fusion and suppression of redundant information. Experimental results demonstrate that our method significantly outperforms existing state-of-the-art self-supervised monocular depth estimation techniques for indoor scenes. Extensive ablation studies are conducted on each module of PRDepth. PRDepth demonstrates precise depth estimation and robust generalization across indoor datasets, including NYUv2, 7-Scenes, ScanNet, and InteriorNet.\",\"PeriodicalId\":13341,\"journal\":{\"name\":\"IEEE Transactions on Instrumentation and Measurement\",\"volume\":\"74 \",\"pages\":\"1-16\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2025-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Instrumentation and Measurement\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10974737/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10974737/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

室内深度测量在虚拟现实和增强现实等技术中有着广泛的应用。然而，室内场景通常是用手持相机拍摄的，导致帧之间的变化更加复杂和不可预测。自监督深度估计依赖于帧到帧的投影来实现自我约束，帧之间不准确的姿态预测严重阻碍了室内环境下的深度估计。为了解决这一问题，提出了一种针对室内环境的自监督姿态优化方法PRDepth。PRDepth引入姿态重构迭代模块（PRIM），对多帧姿态分解和重构进行细化。通过利用中间帧的上下文信息，它减轻了由于大旋转引起的估计误差，减少了由于不准确旋转引起的训练误差，从而在帧之间实现更精确的姿态预测。此外，为了增强深度网络的信息交换和上下文整合能力，PRDepth具有深度加权激励模块，该模块包括编解码器中的全局深度增强模块（GDEM）和解码器中的权重自适应激励模块（WAIM）。GDEM通过与全局跨维数据交互，提高了网络在复杂场景中提取深度信息的能力。WAIM采用注意引导机制对多尺度特征信息进行聚合，并为不同特征分配自适应权重，保证了高效的全局上下文融合和冗余信息的抑制。实验结果表明，我们的方法明显优于现有的室内场景自监督单目深度估计技术。对PRDepth的每个模块进行了广泛的消融研究。PRDepth在包括NYUv2、7-Scenes、ScanNet和interornet在内的室内数据集上展示了精确的深度估计和稳健的泛化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PRDepth: Pose Refinement Enhancement-Based Monocular Depth Estimation for Indoor Scenes

Indoor depth measurement is widely used in technologies such as virtual reality and augmented reality. However, indoor scenes are typically captured with handheld cameras, resulting in more complex and unpredictable variations between frames. Self-supervised depth estimation relies on frame-to-frame projection for self-constraint, and inaccurate pose predictions between frames significantly hinder depth estimation in indoor environments. To address this issue, PRDepth is proposed, a self-supervised pose refinement method tailored for indoor environments. PRDepth introduces a pose reconstruction iterative module (PRIM) that refines multiframe pose decomposition and reconstruction. By leveraging contextual information from intermediate frames, it mitigates estimation errors caused by large rotations and reduces training errors due to inaccurate rotations, leading to more precise pose predictions between frames. Additionally, to enhance the information exchange and context integration capabilities of the depth network, PRDepth features a depth-weighted incentive module, which includes a global depth enhancement module (GDEM) in the encoder-decoder and a weight-adaptive incentive module (WAIM) in the decoder. The GDEM improves the network’s ability to extract depth information in complex scenes by interacting with global cross-dimensional data. An attention-guided mechanism is adopted by the WAIM to aggregate multiscale feature information and assign adaptive weights to different features, ensuring efficient global context fusion and suppression of redundant information. Experimental results demonstrate that our method significantly outperforms existing state-of-the-art self-supervised monocular depth estimation techniques for indoor scenes. Extensive ablation studies are conducted on each module of PRDepth. PRDepth demonstrates precise depth estimation and robust generalization across indoor datasets, including NYUv2, 7-Scenes, ScanNet, and InteriorNet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Instrumentation and Measurement 工程技术-工程：电子与电气

CiteScore

9.00

自引率

23.20%

发文量

1294

审稿时长

3.9 months

期刊介绍： Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.