Learn depth space from light field via a distance-constraint query mechanism

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Hao Sheng , Rongshan Chen , Ruixuan Cong , Da Yang , Zhenglong Cui , Sizhe Wang
{"title":"Learn depth space from light field via a distance-constraint query mechanism","authors":"Hao Sheng ,&nbsp;Rongshan Chen ,&nbsp;Ruixuan Cong ,&nbsp;Da Yang ,&nbsp;Zhenglong Cui ,&nbsp;Sizhe Wang","doi":"10.1016/j.patcog.2025.112403","DOIUrl":null,"url":null,"abstract":"<div><div>The Light Field (LF) captures both spatial and angular information of scenes, enabling precise depth estimation. Recent advancements in deep learning have led to significant success in this field; however, existing methods primarily focus on modeling surface characteristics (e.g., depth maps) while overlooking the depth space, which contains additional valuable information. The depth space consists of numerous space points and provides substantially more geometric data than a single depth map. In this paper, we conceptualize depth prediction as a spatial modeling problem, aiming to learn the entire depth space rather than merely a single depth map. Specifically, we define space points as signed distances relative to the scene surface and propose a novel distance-constraint query mechanism for LF depth estimation. To model the depth space effectively, we first develop a mixed sampling strategy to approximate its data representation. Subsequently, we introduce an encoder-decoder network architecture to query the distances of each point, thereby implicitly embedding the depth space. Finally, to extract the target depth map from this space, we present a generation algorithm that iteratively invokes the decoder network. Through extensive experiments, our approach achieves the highest performance on LF depth estimation benchmarks, and also demonstrates superior performance on various synthetic and real-world scenes.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112403"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325010647","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The Light Field (LF) captures both spatial and angular information of scenes, enabling precise depth estimation. Recent advancements in deep learning have led to significant success in this field; however, existing methods primarily focus on modeling surface characteristics (e.g., depth maps) while overlooking the depth space, which contains additional valuable information. The depth space consists of numerous space points and provides substantially more geometric data than a single depth map. In this paper, we conceptualize depth prediction as a spatial modeling problem, aiming to learn the entire depth space rather than merely a single depth map. Specifically, we define space points as signed distances relative to the scene surface and propose a novel distance-constraint query mechanism for LF depth estimation. To model the depth space effectively, we first develop a mixed sampling strategy to approximate its data representation. Subsequently, we introduce an encoder-decoder network architecture to query the distances of each point, thereby implicitly embedding the depth space. Finally, to extract the target depth map from this space, we present a generation algorithm that iteratively invokes the decoder network. Through extensive experiments, our approach achieves the highest performance on LF depth estimation benchmarks, and also demonstrates superior performance on various synthetic and real-world scenes.
通过距离约束查询机制从光场学习深度空间
光场(LF)捕获场景的空间和角度信息,实现精确的深度估计。深度学习的最新进展在这一领域取得了重大成功;然而,现有的方法主要侧重于表面特征的建模(例如深度图),而忽略了深度空间,其中包含额外的有价值的信息。深度空间由许多空间点组成,比单个深度图提供更多的几何数据。在本文中,我们将深度预测概念化为一个空间建模问题,旨在学习整个深度空间而不仅仅是单个深度图。具体来说,我们将空间点定义为相对于场景表面的有符号距离,并提出了一种新的距离约束查询机制用于LF深度估计。为了有效地对深度空间建模,我们首先开发了一种混合采样策略来近似其数据表示。随后,我们引入了一个编码器-解码器网络架构来查询每个点的距离,从而隐式嵌入深度空间。最后,为了从该空间中提取目标深度图,我们提出了一种迭代调用解码器网络的生成算法。通过大量的实验,我们的方法在LF深度估计基准上达到了最高的性能,并且在各种合成和真实场景上也表现出了卓越的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信