Learn depth space from light field via a distance-constraint query mechanism

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-09-04 DOI:10.1016/j.patcog.2025.112403

Hao Sheng , Rongshan Chen , Ruixuan Cong , Da Yang , Zhenglong Cui , Sizhe Wang

{"title":"Learn depth space from light field via a distance-constraint query mechanism","authors":"Hao Sheng , Rongshan Chen , Ruixuan Cong , Da Yang , Zhenglong Cui , Sizhe Wang","doi":"10.1016/j.patcog.2025.112403","DOIUrl":null,"url":null,"abstract":"<div><div>The Light Field (LF) captures both spatial and angular information of scenes, enabling precise depth estimation. Recent advancements in deep learning have led to significant success in this field; however, existing methods primarily focus on modeling surface characteristics (e.g., depth maps) while overlooking the depth space, which contains additional valuable information. The depth space consists of numerous space points and provides substantially more geometric data than a single depth map. In this paper, we conceptualize depth prediction as a spatial modeling problem, aiming to learn the entire depth space rather than merely a single depth map. Specifically, we define space points as signed distances relative to the scene surface and propose a novel distance-constraint query mechanism for LF depth estimation. To model the depth space effectively, we first develop a mixed sampling strategy to approximate its data representation. Subsequently, we introduce an encoder-decoder network architecture to query the distances of each point, thereby implicitly embedding the depth space. Finally, to extract the target depth map from this space, we present a generation algorithm that iteratively invokes the decoder network. Through extensive experiments, our approach achieves the highest performance on LF depth estimation benchmarks, and also demonstrates superior performance on various synthetic and real-world scenes.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112403"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325010647","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The Light Field (LF) captures both spatial and angular information of scenes, enabling precise depth estimation. Recent advancements in deep learning have led to significant success in this field; however, existing methods primarily focus on modeling surface characteristics (e.g., depth maps) while overlooking the depth space, which contains additional valuable information. The depth space consists of numerous space points and provides substantially more geometric data than a single depth map. In this paper, we conceptualize depth prediction as a spatial modeling problem, aiming to learn the entire depth space rather than merely a single depth map. Specifically, we define space points as signed distances relative to the scene surface and propose a novel distance-constraint query mechanism for LF depth estimation. To model the depth space effectively, we first develop a mixed sampling strategy to approximate its data representation. Subsequently, we introduce an encoder-decoder network architecture to query the distances of each point, thereby implicitly embedding the depth space. Finally, to extract the target depth map from this space, we present a generation algorithm that iteratively invokes the decoder network. Through extensive experiments, our approach achieves the highest performance on LF depth estimation benchmarks, and also demonstrates superior performance on various synthetic and real-world scenes.

查看原文本刊更多论文

通过距离约束查询机制从光场学习深度空间

光场（LF）捕获场景的空间和角度信息，实现精确的深度估计。深度学习的最新进展在这一领域取得了重大成功；然而，现有的方法主要侧重于表面特征的建模（例如深度图），而忽略了深度空间，其中包含额外的有价值的信息。深度空间由许多空间点组成，比单个深度图提供更多的几何数据。在本文中，我们将深度预测概念化为一个空间建模问题，旨在学习整个深度空间而不仅仅是单个深度图。具体来说，我们将空间点定义为相对于场景表面的有符号距离，并提出了一种新的距离约束查询机制用于LF深度估计。为了有效地对深度空间建模，我们首先开发了一种混合采样策略来近似其数据表示。随后，我们引入了一个编码器-解码器网络架构来查询每个点的距离，从而隐式嵌入深度空间。最后，为了从该空间中提取目标深度图，我们提出了一种迭代调用解码器网络的生成算法。通过大量的实验，我们的方法在LF深度估计基准上达到了最高的性能，并且在各种合成和真实场景上也表现出了卓越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.