基于几何先验的双投影融合网络的单目全景深度估计

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-21 DOI:10.1109/TCSVT.2025.3553472

Chengchao Huang;Feng Shao;Hangwei Chen;Baoyang Mu;Long Xu

{"title":"基于几何先验的双投影融合网络的单目全景深度估计","authors":"Chengchao Huang;Feng Shao;Hangwei Chen;Baoyang Mu;Long Xu","doi":"10.1109/TCSVT.2025.3553472","DOIUrl":null,"url":null,"abstract":"Panoramic depth estimation is crucial for acquiring comprehensive 3D environmental perception information, serving as a foundational basis for numerous panoramic vision tasks. The key challenge in panoramic depth estimation is how to address various distortions in 360° omnidirectional images. Most panoramic images are displayed as 2D equirectangular projections, which exhibit significant distortion, particularly with the severe fisheye effect near the equatorial regions. Traditional depth estimation methods for perspective images are unsuitable for such projections. On the other hand, cubemap projection consists of six distortion-free perspective images, allowing the use of existing depth estimation methods. However, the boundaries between faces of a cubemap projection introduce discontinuities, causing a loss of global information when using cube maps alone. In this work, we propose an innovative geometric priors assisted dual-projection fusion network (GADFNet) that leverages geometric priors of panoramic images and the strengths of both projection types to enhance the accuracy of panoramic depth estimation. Specifically, to better focus the network on key areas, we introduce a distortion perception module (DPM) and incorporate geometric information into the loss function. To more effectively extract global information from the equirectangular projection branch, we propose a scene understanding module (SUM), which captures features from different dimensions. Additionally, to achieve effective fusion of the two projections, we design a dual projection adaptive fusion module (DPAFM) to dynamically adjust the weights of the two branches during fusion. Extensive experiments conducted on four public datasets (including both virtual and real-world scenarios) demonstrate that our proposed GADFNet outperforms existing methods, achieving superior performance.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9060-9074"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GADFNet: Geometric Priors Assisted Dual-Projection Fusion Network for Monocular Panoramic Depth Estimation\",\"authors\":\"Chengchao Huang;Feng Shao;Hangwei Chen;Baoyang Mu;Long Xu\",\"doi\":\"10.1109/TCSVT.2025.3553472\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Panoramic depth estimation is crucial for acquiring comprehensive 3D environmental perception information, serving as a foundational basis for numerous panoramic vision tasks. The key challenge in panoramic depth estimation is how to address various distortions in 360° omnidirectional images. Most panoramic images are displayed as 2D equirectangular projections, which exhibit significant distortion, particularly with the severe fisheye effect near the equatorial regions. Traditional depth estimation methods for perspective images are unsuitable for such projections. On the other hand, cubemap projection consists of six distortion-free perspective images, allowing the use of existing depth estimation methods. However, the boundaries between faces of a cubemap projection introduce discontinuities, causing a loss of global information when using cube maps alone. In this work, we propose an innovative geometric priors assisted dual-projection fusion network (GADFNet) that leverages geometric priors of panoramic images and the strengths of both projection types to enhance the accuracy of panoramic depth estimation. Specifically, to better focus the network on key areas, we introduce a distortion perception module (DPM) and incorporate geometric information into the loss function. To more effectively extract global information from the equirectangular projection branch, we propose a scene understanding module (SUM), which captures features from different dimensions. Additionally, to achieve effective fusion of the two projections, we design a dual projection adaptive fusion module (DPAFM) to dynamically adjust the weights of the two branches during fusion. Extensive experiments conducted on four public datasets (including both virtual and real-world scenarios) demonstrate that our proposed GADFNet outperforms existing methods, achieving superior performance.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 9\",\"pages\":\"9060-9074\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2025-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10937216/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10937216/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

全景深度估计对于获取全面的三维环境感知信息至关重要，是许多全景视觉任务的基础。全景深度估计的关键挑战是如何处理360°全向图像中的各种失真。大多数全景图像显示为二维等矩形投影，呈现出明显的失真，特别是在赤道地区附近有严重的鱼眼效应。传统的透视图像深度估计方法不适合这种投影。另一方面，立方体地图投影由六个无失真的透视图像组成，允许使用现有的深度估计方法。然而，立方体映射投影面之间的边界引入了不连续，导致单独使用立方体映射时丢失全局信息。在这项工作中，我们提出了一种创新的几何先验辅助双投影融合网络（GADFNet），该网络利用全景图像的几何先验和两种投影类型的优势来提高全景深度估计的准确性。具体来说，为了更好地将网络集中在关键区域，我们引入了失真感知模块（DPM），并将几何信息纳入损失函数。为了更有效地从等矩形投影分支中提取全局信息，我们提出了一个场景理解模块（SUM），该模块从不同的维度捕获特征。此外，为了实现两个投影的有效融合，我们设计了双投影自适应融合模块（DPAFM），在融合过程中动态调整两个分支的权值。在四个公共数据集（包括虚拟和现实场景）上进行的大量实验表明，我们提出的GADFNet优于现有方法，实现了卓越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GADFNet: Geometric Priors Assisted Dual-Projection Fusion Network for Monocular Panoramic Depth Estimation

Panoramic depth estimation is crucial for acquiring comprehensive 3D environmental perception information, serving as a foundational basis for numerous panoramic vision tasks. The key challenge in panoramic depth estimation is how to address various distortions in 360° omnidirectional images. Most panoramic images are displayed as 2D equirectangular projections, which exhibit significant distortion, particularly with the severe fisheye effect near the equatorial regions. Traditional depth estimation methods for perspective images are unsuitable for such projections. On the other hand, cubemap projection consists of six distortion-free perspective images, allowing the use of existing depth estimation methods. However, the boundaries between faces of a cubemap projection introduce discontinuities, causing a loss of global information when using cube maps alone. In this work, we propose an innovative geometric priors assisted dual-projection fusion network (GADFNet) that leverages geometric priors of panoramic images and the strengths of both projection types to enhance the accuracy of panoramic depth estimation. Specifically, to better focus the network on key areas, we introduce a distortion perception module (DPM) and incorporate geometric information into the loss function. To more effectively extract global information from the equirectangular projection branch, we propose a scene understanding module (SUM), which captures features from different dimensions. Additionally, to achieve effective fusion of the two projections, we design a dual projection adaptive fusion module (DPAFM) to dynamically adjust the weights of the two branches during fusion. Extensive experiments conducted on four public datasets (including both virtual and real-world scenarios) demonstrate that our proposed GADFNet outperforms existing methods, achieving superior performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.