{"title":"基于几何先验的双投影融合网络的单目全景深度估计","authors":"Chengchao Huang;Feng Shao;Hangwei Chen;Baoyang Mu;Long Xu","doi":"10.1109/TCSVT.2025.3553472","DOIUrl":null,"url":null,"abstract":"Panoramic depth estimation is crucial for acquiring comprehensive 3D environmental perception information, serving as a foundational basis for numerous panoramic vision tasks. The key challenge in panoramic depth estimation is how to address various distortions in 360° omnidirectional images. Most panoramic images are displayed as 2D equirectangular projections, which exhibit significant distortion, particularly with the severe fisheye effect near the equatorial regions. Traditional depth estimation methods for perspective images are unsuitable for such projections. On the other hand, cubemap projection consists of six distortion-free perspective images, allowing the use of existing depth estimation methods. However, the boundaries between faces of a cubemap projection introduce discontinuities, causing a loss of global information when using cube maps alone. In this work, we propose an innovative geometric priors assisted dual-projection fusion network (GADFNet) that leverages geometric priors of panoramic images and the strengths of both projection types to enhance the accuracy of panoramic depth estimation. Specifically, to better focus the network on key areas, we introduce a distortion perception module (DPM) and incorporate geometric information into the loss function. To more effectively extract global information from the equirectangular projection branch, we propose a scene understanding module (SUM), which captures features from different dimensions. Additionally, to achieve effective fusion of the two projections, we design a dual projection adaptive fusion module (DPAFM) to dynamically adjust the weights of the two branches during fusion. Extensive experiments conducted on four public datasets (including both virtual and real-world scenarios) demonstrate that our proposed GADFNet outperforms existing methods, achieving superior performance.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9060-9074"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GADFNet: Geometric Priors Assisted Dual-Projection Fusion Network for Monocular Panoramic Depth Estimation\",\"authors\":\"Chengchao Huang;Feng Shao;Hangwei Chen;Baoyang Mu;Long Xu\",\"doi\":\"10.1109/TCSVT.2025.3553472\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Panoramic depth estimation is crucial for acquiring comprehensive 3D environmental perception information, serving as a foundational basis for numerous panoramic vision tasks. The key challenge in panoramic depth estimation is how to address various distortions in 360° omnidirectional images. Most panoramic images are displayed as 2D equirectangular projections, which exhibit significant distortion, particularly with the severe fisheye effect near the equatorial regions. Traditional depth estimation methods for perspective images are unsuitable for such projections. On the other hand, cubemap projection consists of six distortion-free perspective images, allowing the use of existing depth estimation methods. However, the boundaries between faces of a cubemap projection introduce discontinuities, causing a loss of global information when using cube maps alone. In this work, we propose an innovative geometric priors assisted dual-projection fusion network (GADFNet) that leverages geometric priors of panoramic images and the strengths of both projection types to enhance the accuracy of panoramic depth estimation. Specifically, to better focus the network on key areas, we introduce a distortion perception module (DPM) and incorporate geometric information into the loss function. To more effectively extract global information from the equirectangular projection branch, we propose a scene understanding module (SUM), which captures features from different dimensions. Additionally, to achieve effective fusion of the two projections, we design a dual projection adaptive fusion module (DPAFM) to dynamically adjust the weights of the two branches during fusion. Extensive experiments conducted on four public datasets (including both virtual and real-world scenarios) demonstrate that our proposed GADFNet outperforms existing methods, achieving superior performance.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 9\",\"pages\":\"9060-9074\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2025-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10937216/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10937216/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Panoramic depth estimation is crucial for acquiring comprehensive 3D environmental perception information, serving as a foundational basis for numerous panoramic vision tasks. The key challenge in panoramic depth estimation is how to address various distortions in 360° omnidirectional images. Most panoramic images are displayed as 2D equirectangular projections, which exhibit significant distortion, particularly with the severe fisheye effect near the equatorial regions. Traditional depth estimation methods for perspective images are unsuitable for such projections. On the other hand, cubemap projection consists of six distortion-free perspective images, allowing the use of existing depth estimation methods. However, the boundaries between faces of a cubemap projection introduce discontinuities, causing a loss of global information when using cube maps alone. In this work, we propose an innovative geometric priors assisted dual-projection fusion network (GADFNet) that leverages geometric priors of panoramic images and the strengths of both projection types to enhance the accuracy of panoramic depth estimation. Specifically, to better focus the network on key areas, we introduce a distortion perception module (DPM) and incorporate geometric information into the loss function. To more effectively extract global information from the equirectangular projection branch, we propose a scene understanding module (SUM), which captures features from different dimensions. Additionally, to achieve effective fusion of the two projections, we design a dual projection adaptive fusion module (DPAFM) to dynamically adjust the weights of the two branches during fusion. Extensive experiments conducted on four public datasets (including both virtual and real-world scenarios) demonstrate that our proposed GADFNet outperforms existing methods, achieving superior performance.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.