{"title":"Multi-level pyramid fusion for efficient stereo matching","authors":"Jiaqi Zhu, Bin Li, Xinhua Zhao","doi":"10.1007/s00530-024-01419-4","DOIUrl":null,"url":null,"abstract":"<p>Stereo matching is a key technology for many autonomous driving and robotics applications. Recently, methods based on Convolutional Neural Network have achieved huge progress. However, it is still difficult to find accurate matching points in inherently ill-posed regions such as areas with weak texture and reflective surfaces. In this paper, we propose a multi-level pyramid fusion volume (MPFV-Stereo) which contains two prominent components: multi-scale cost volume (MSCV) and multi-level cost volume (MLCV). We also design a low-parameter Gaussian attention module to excite cost volume. Our MPFV-Stereo ranks 2nd on KITTI 2012 (Reflective) among all published methods. In addition, MPFV-Stereo has competitive results on both Scene Flow and KITTI datasets and requires less training to achieve strong cross-dataset generalization on Middlebury and ETH3D benchmark.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"56 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01419-4","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Stereo matching is a key technology for many autonomous driving and robotics applications. Recently, methods based on Convolutional Neural Network have achieved huge progress. However, it is still difficult to find accurate matching points in inherently ill-posed regions such as areas with weak texture and reflective surfaces. In this paper, we propose a multi-level pyramid fusion volume (MPFV-Stereo) which contains two prominent components: multi-scale cost volume (MSCV) and multi-level cost volume (MLCV). We also design a low-parameter Gaussian attention module to excite cost volume. Our MPFV-Stereo ranks 2nd on KITTI 2012 (Reflective) among all published methods. In addition, MPFV-Stereo has competitive results on both Scene Flow and KITTI datasets and requires less training to achieve strong cross-dataset generalization on Middlebury and ETH3D benchmark.
期刊介绍:
This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.