A survey on conventional and learning‐based methods for multi‐view stereo

Elisavet (Ellie) Konstantina Stathopoulou, F. Remondino
{"title":"A survey on conventional and learning‐based methods for multi‐view stereo","authors":"Elisavet (Ellie) Konstantina Stathopoulou, F. Remondino","doi":"10.1111/phor.12456","DOIUrl":null,"url":null,"abstract":"3D reconstruction of scenes using multiple images, relying on robust correspondence search and depth estimation, has been thoroughly studied for the two‐view and multi‐view scenarios in recent years. Multi‐view stereo (MVS) algorithms aim to generate a rich, dense 3D model of the scene in the form of a dense point cloud or a triangulated mesh. In a typical MVS pipeline, the robust estimations for the camera poses along with the sparse points obtained from structure from motion (SfM) are used as input. During this process, the depth of generally every pixel of the scene is to be calculated. Several methods, either conventional or, more recently, learning‐based have been developed for solving the correspondence search problem. A vast amount of research exists in the literature using local, global or semi‐global stereomatching approaches, with the PatchMatch algorithm being among the most popular and efficient conventional ones in the last decade. Yet, and despite the widespread evolution of the algorithms, yielding complete, accurate and aesthetically pleasing 3D representations of a scene remains an open issue in real‐world and large‐scale photogrammetric applications. This work aims to provide a concrete survey on the most widely used MVS methods, investigating underlying concepts and challenges. To this end, the theoretical background and relative literature are discussed for both conventional and learning‐based approaches, with a particular focus on close‐range 3D reconstruction applications.","PeriodicalId":22881,"journal":{"name":"The Photogrammetric Record","volume":"96 1","pages":"374 - 407"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Photogrammetric Record","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/phor.12456","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

3D reconstruction of scenes using multiple images, relying on robust correspondence search and depth estimation, has been thoroughly studied for the two‐view and multi‐view scenarios in recent years. Multi‐view stereo (MVS) algorithms aim to generate a rich, dense 3D model of the scene in the form of a dense point cloud or a triangulated mesh. In a typical MVS pipeline, the robust estimations for the camera poses along with the sparse points obtained from structure from motion (SfM) are used as input. During this process, the depth of generally every pixel of the scene is to be calculated. Several methods, either conventional or, more recently, learning‐based have been developed for solving the correspondence search problem. A vast amount of research exists in the literature using local, global or semi‐global stereomatching approaches, with the PatchMatch algorithm being among the most popular and efficient conventional ones in the last decade. Yet, and despite the widespread evolution of the algorithms, yielding complete, accurate and aesthetically pleasing 3D representations of a scene remains an open issue in real‐world and large‐scale photogrammetric applications. This work aims to provide a concrete survey on the most widely used MVS methods, investigating underlying concepts and challenges. To this end, the theoretical background and relative literature are discussed for both conventional and learning‐based approaches, with a particular focus on close‐range 3D reconstruction applications.

Abstract Image

多视点立体视觉的传统方法和基于学习的方法综述
近年来,基于鲁棒对应搜索和深度估计的多幅图像场景三维重建研究已经深入到双视图和多视图场景中。多视图立体(MVS)算法旨在以密集点云或三角网格的形式生成丰富、密集的场景3D模型。在典型的MVS流水线中,使用摄像机姿态的鲁棒估计和由运动构造(SfM)得到的稀疏点作为输入。在这个过程中,通常要计算场景中每个像素的深度。有几种方法,无论是传统的还是最近的基于学习的,都被开发出来用于解决对应搜索问题。文献中存在大量使用局部、全局或半全局立体匹配方法的研究,其中PatchMatch算法是过去十年中最流行和最有效的传统算法之一。然而,尽管算法得到了广泛的发展,但在现实世界和大规模摄影测量应用中,生成完整、准确和美观的场景3D表示仍然是一个悬而未决的问题。这项工作旨在对最广泛使用的MVS方法进行具体调查,研究潜在的概念和挑战。为此,本文讨论了传统方法和基于学习的方法的理论背景和相关文献,特别关注了近距离三维重建的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信