Geometry-Aware Self-Supervised Indoor 360$^{\circ }$ Depth Estimation via Asymmetric Dual-Domain Collaborative Learning

IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Xu Wang;Ziyan He;Qiudan Zhang;You Yang;Tiesong Zhao;Jianmin Jiang
{"title":"Geometry-Aware Self-Supervised Indoor 360$^{\\circ }$ Depth Estimation via Asymmetric Dual-Domain Collaborative Learning","authors":"Xu Wang;Ziyan He;Qiudan Zhang;You Yang;Tiesong Zhao;Jianmin Jiang","doi":"10.1109/TMM.2025.3535340","DOIUrl":null,"url":null,"abstract":"Being able to estimate monocular depth for spherical panoramas is of fundamental importance in 3D scene perception. However, spherical distortion severely limits the effectiveness of vanilla convolutions. To push the envelope of accuracy, recent approaches attempt to utilize Tangent projection (TP) to estimate the depth of <inline-formula><tex-math>$360 ^{\\circ }$</tex-math></inline-formula> images. Yet, these methods still suffer from discrepancies and inconsistencies among patch-wise tangent images, as well as the lack of accurate ground truth depth maps under a supervised fashion. In this paper, we propose a geometry-aware self-supervised <inline-formula><tex-math>$360 ^{\\circ }$</tex-math></inline-formula> image depth estimation methodology that explores the complementary advantages of TP and Equirectangular projection (ERP) by an asymmetric dual-domain collaborative learning strategy. Especially, we first develop a lightweight asymmetric dual-domain depth estimation network, which enables to aggregate depth-related features from a single TP domain, and then produce depth distributions of the TP and ERP domains via collaborative learning. This effectively mitigates stitching artifacts and preserves fine details in depth inference without overspending model parameters. In addition, a frequent-spatial feature concentration module is devised to simultaneously capture non-local Fourier features and local spatial features, such that facilitating the efficient exploration of monocular depth cues. Moreover, we introduce a geometric structural alignment module to further improve geometric structural consistency among tangent images. Extensive experiments illustrate that our designed approach outperforms existing self-supervised <inline-formula><tex-math>$360 ^{\\circ }$</tex-math></inline-formula> depth estimation methods on three publicly available benchmark datasets.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3224-3237"},"PeriodicalIF":9.7000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10855624/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Being able to estimate monocular depth for spherical panoramas is of fundamental importance in 3D scene perception. However, spherical distortion severely limits the effectiveness of vanilla convolutions. To push the envelope of accuracy, recent approaches attempt to utilize Tangent projection (TP) to estimate the depth of $360 ^{\circ }$ images. Yet, these methods still suffer from discrepancies and inconsistencies among patch-wise tangent images, as well as the lack of accurate ground truth depth maps under a supervised fashion. In this paper, we propose a geometry-aware self-supervised $360 ^{\circ }$ image depth estimation methodology that explores the complementary advantages of TP and Equirectangular projection (ERP) by an asymmetric dual-domain collaborative learning strategy. Especially, we first develop a lightweight asymmetric dual-domain depth estimation network, which enables to aggregate depth-related features from a single TP domain, and then produce depth distributions of the TP and ERP domains via collaborative learning. This effectively mitigates stitching artifacts and preserves fine details in depth inference without overspending model parameters. In addition, a frequent-spatial feature concentration module is devised to simultaneously capture non-local Fourier features and local spatial features, such that facilitating the efficient exploration of monocular depth cues. Moreover, we introduce a geometric structural alignment module to further improve geometric structural consistency among tangent images. Extensive experiments illustrate that our designed approach outperforms existing self-supervised $360 ^{\circ }$ depth estimation methods on three publicly available benchmark datasets.
基于非对称双域协同学习的几何感知自监督室内360$^{\circ}$深度估计
能够估计球面全景的单目深度在3D场景感知中是至关重要的。然而,球面畸变严重限制了普通卷积的有效性。为了提高精度,最近的方法试图利用切线投影(TP)来估计$360 ^{\circ}$图像的深度。然而,这些方法仍然存在着局部切线图像之间的差异和不一致,以及在监督方式下缺乏准确的地真深度图。在本文中,我们提出了一种几何感知的自监督$360 ^{\circ}$图像深度估计方法,该方法通过非对称双域协作学习策略探索了TP和等矩形投影(ERP)的互补优势。特别是,我们首先开发了一个轻量级的非对称双域深度估计网络,该网络能够从单个TP域中聚合深度相关特征,然后通过协作学习生成TP和ERP域的深度分布。这有效地减轻了拼接伪影,并在深度推理中保留了精细的细节,而不会占用过多的模型参数。此外,设计了一个频率空间特征集中模块,以同时捕获非局部傅里叶特征和局部空间特征,从而促进单眼深度线索的有效探索。此外,我们还引入了几何结构对齐模块,以进一步提高切线图像之间的几何结构一致性。大量实验表明,我们设计的方法在三个公开可用的基准数据集上优于现有的自监督$360 ^{\circ}$深度估计方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Multimedia
IEEE Transactions on Multimedia 工程技术-电信学
CiteScore
11.70
自引率
11.00%
发文量
576
审稿时长
5.5 months
期刊介绍: The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信