Multi-level pyramid fusion for efficient stereo matching

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems Pub Date : 2024-08-12 DOI:10.1007/s00530-024-01419-4

Jiaqi Zhu, Bin Li, Xinhua Zhao

引用次数: 0

Abstract

Stereo matching is a key technology for many autonomous driving and robotics applications. Recently, methods based on Convolutional Neural Network have achieved huge progress. However, it is still difficult to find accurate matching points in inherently ill-posed regions such as areas with weak texture and reflective surfaces. In this paper, we propose a multi-level pyramid fusion volume (MPFV-Stereo) which contains two prominent components: multi-scale cost volume (MSCV) and multi-level cost volume (MLCV). We also design a low-parameter Gaussian attention module to excite cost volume. Our MPFV-Stereo ranks 2nd on KITTI 2012 (Reflective) among all published methods. In addition, MPFV-Stereo has competitive results on both Scene Flow and KITTI datasets and requires less training to achieve strong cross-dataset generalization on Middlebury and ETH3D benchmark.

Abstract Image

查看原文本刊更多论文

多级金字塔融合实现高效立体匹配

立体匹配是许多自动驾驶和机器人应用的关键技术。最近，基于卷积神经网络的方法取得了巨大进步。然而，在纹理薄弱区域和反光表面等固有问题区域，仍然很难找到精确的匹配点。在本文中，我们提出了一种多层次金字塔融合体（MPFV-Stereo），它包含两个重要组成部分：多尺度成本体（MSCV）和多层次成本体（MLCV）。我们还设计了一个低参数高斯注意模块来激发成本体积。我们的 MPFV-Stereo 在 2012 年 KITTI（反思）上，在所有已发布的方法中排名第二。此外，MPFV-Stereo 在 Scene Flow 和 KITTI 数据集上的结果也很有竞争力，而且在 Middlebury 和 ETH3D 基准上，只需较少的训练即可实现较强的跨数据集泛化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Multimedia Systems 工程技术-计算机：理论方法

CiteScore

5.40

自引率

7.70%

发文量

148

审稿时长

4.5 months

期刊介绍： This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.