{"title":"PRDepth:基于姿态细化增强的室内场景单目深度估计","authors":"Chenggong Han;Chen Lv;Xiaolin Huang;Qiqi Kou;Deqiang Cheng;He Jiang","doi":"10.1109/TIM.2025.3562976","DOIUrl":null,"url":null,"abstract":"Indoor depth measurement is widely used in technologies such as virtual reality and augmented reality. However, indoor scenes are typically captured with handheld cameras, resulting in more complex and unpredictable variations between frames. Self-supervised depth estimation relies on frame-to-frame projection for self-constraint, and inaccurate pose predictions between frames significantly hinder depth estimation in indoor environments. To address this issue, PRDepth is proposed, a self-supervised pose refinement method tailored for indoor environments. PRDepth introduces a pose reconstruction iterative module (PRIM) that refines multiframe pose decomposition and reconstruction. By leveraging contextual information from intermediate frames, it mitigates estimation errors caused by large rotations and reduces training errors due to inaccurate rotations, leading to more precise pose predictions between frames. Additionally, to enhance the information exchange and context integration capabilities of the depth network, PRDepth features a depth-weighted incentive module, which includes a global depth enhancement module (GDEM) in the encoder-decoder and a weight-adaptive incentive module (WAIM) in the decoder. The GDEM improves the network’s ability to extract depth information in complex scenes by interacting with global cross-dimensional data. An attention-guided mechanism is adopted by the WAIM to aggregate multiscale feature information and assign adaptive weights to different features, ensuring efficient global context fusion and suppression of redundant information. Experimental results demonstrate that our method significantly outperforms existing state-of-the-art self-supervised monocular depth estimation techniques for indoor scenes. Extensive ablation studies are conducted on each module of PRDepth. PRDepth demonstrates precise depth estimation and robust generalization across indoor datasets, including NYUv2, 7-Scenes, ScanNet, and InteriorNet.","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":"74 ","pages":"1-16"},"PeriodicalIF":5.6000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PRDepth: Pose Refinement Enhancement-Based Monocular Depth Estimation for Indoor Scenes\",\"authors\":\"Chenggong Han;Chen Lv;Xiaolin Huang;Qiqi Kou;Deqiang Cheng;He Jiang\",\"doi\":\"10.1109/TIM.2025.3562976\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Indoor depth measurement is widely used in technologies such as virtual reality and augmented reality. However, indoor scenes are typically captured with handheld cameras, resulting in more complex and unpredictable variations between frames. Self-supervised depth estimation relies on frame-to-frame projection for self-constraint, and inaccurate pose predictions between frames significantly hinder depth estimation in indoor environments. To address this issue, PRDepth is proposed, a self-supervised pose refinement method tailored for indoor environments. PRDepth introduces a pose reconstruction iterative module (PRIM) that refines multiframe pose decomposition and reconstruction. By leveraging contextual information from intermediate frames, it mitigates estimation errors caused by large rotations and reduces training errors due to inaccurate rotations, leading to more precise pose predictions between frames. Additionally, to enhance the information exchange and context integration capabilities of the depth network, PRDepth features a depth-weighted incentive module, which includes a global depth enhancement module (GDEM) in the encoder-decoder and a weight-adaptive incentive module (WAIM) in the decoder. The GDEM improves the network’s ability to extract depth information in complex scenes by interacting with global cross-dimensional data. An attention-guided mechanism is adopted by the WAIM to aggregate multiscale feature information and assign adaptive weights to different features, ensuring efficient global context fusion and suppression of redundant information. Experimental results demonstrate that our method significantly outperforms existing state-of-the-art self-supervised monocular depth estimation techniques for indoor scenes. Extensive ablation studies are conducted on each module of PRDepth. PRDepth demonstrates precise depth estimation and robust generalization across indoor datasets, including NYUv2, 7-Scenes, ScanNet, and InteriorNet.\",\"PeriodicalId\":13341,\"journal\":{\"name\":\"IEEE Transactions on Instrumentation and Measurement\",\"volume\":\"74 \",\"pages\":\"1-16\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2025-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Instrumentation and Measurement\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10974737/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10974737/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
PRDepth: Pose Refinement Enhancement-Based Monocular Depth Estimation for Indoor Scenes
Indoor depth measurement is widely used in technologies such as virtual reality and augmented reality. However, indoor scenes are typically captured with handheld cameras, resulting in more complex and unpredictable variations between frames. Self-supervised depth estimation relies on frame-to-frame projection for self-constraint, and inaccurate pose predictions between frames significantly hinder depth estimation in indoor environments. To address this issue, PRDepth is proposed, a self-supervised pose refinement method tailored for indoor environments. PRDepth introduces a pose reconstruction iterative module (PRIM) that refines multiframe pose decomposition and reconstruction. By leveraging contextual information from intermediate frames, it mitigates estimation errors caused by large rotations and reduces training errors due to inaccurate rotations, leading to more precise pose predictions between frames. Additionally, to enhance the information exchange and context integration capabilities of the depth network, PRDepth features a depth-weighted incentive module, which includes a global depth enhancement module (GDEM) in the encoder-decoder and a weight-adaptive incentive module (WAIM) in the decoder. The GDEM improves the network’s ability to extract depth information in complex scenes by interacting with global cross-dimensional data. An attention-guided mechanism is adopted by the WAIM to aggregate multiscale feature information and assign adaptive weights to different features, ensuring efficient global context fusion and suppression of redundant information. Experimental results demonstrate that our method significantly outperforms existing state-of-the-art self-supervised monocular depth estimation techniques for indoor scenes. Extensive ablation studies are conducted on each module of PRDepth. PRDepth demonstrates precise depth estimation and robust generalization across indoor datasets, including NYUv2, 7-Scenes, ScanNet, and InteriorNet.
期刊介绍:
Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.