IGEV++: Iterative Multi-Range Geometry Encoding Volumes for Stereo Matching

IF 18.6

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-03-12 DOI:10.1109/TPAMI.2025.3569218

Gangwei Xu;Xianqi Wang;Zhaoxing Zhang;Junda Cheng;Chunyuan Liao;Xin Yang

{"title":"IGEV++: Iterative Multi-Range Geometry Encoding Volumes for Stereo Matching","authors":"Gangwei Xu;Xianqi Wang;Zhaoxing Zhang;Junda Cheng;Chunyuan Liao;Xin Yang","doi":"10.1109/TPAMI.2025.3569218","DOIUrl":null,"url":null,"abstract":"Stereo matching is a core component in many computer vision and robotics systems. Despite significant advances over the last decade, handling matching ambiguities in ill-posed regions and large disparities remains an open challenge. In this paper, we propose a new deep network architecture, called IGEV++, for stereo matching. The proposed IGEV++ constructs Multi-range Geometry Encoding Volumes (MGEV), which encode coarse-grained geometry information for ill-posed regions and large disparities, while preserving fine-grained geometry information for details and small disparities. To construct MGEV, we introduce an adaptive patch matching module that efficiently and effectively computes matching costs for large disparity ranges and/or ill-posed regions. We further propose a selective geometry feature fusion module to adaptively fuse multi-range and multi-granularity geometry features in MGEV. Then, we input the fused geometry features into ConvGRUs to iteratively update the disparity map. MGEV allows to efficiently handle large disparities and ill-posed regions, such as occlusions and textureless regions, and enjoys rapid convergence during iterations. Our IGEV++ achieves the best performance on the Scene Flow test set across all disparity ranges, up to 768px. Our IGEV++ also achieves state-of-the-art accuracy on the Middlebury, ETH3D, KITTI 2012, and 2015 benchmarks. Specifically, IGEV++ achieves a 3.23% 2-pixel outlier rate (Bad 2.0) on the large disparity benchmark, Middlebury, representing error reductions of 31.9% and 54.8% compared to RAFT-Stereo and GMStereo, respectively. We also present a real-time version of IGEV++ that achieves the best performance among all published real-time methods on the KITTI benchmarks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"7108-7122"},"PeriodicalIF":18.6000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11002417/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Stereo matching is a core component in many computer vision and robotics systems. Despite significant advances over the last decade, handling matching ambiguities in ill-posed regions and large disparities remains an open challenge. In this paper, we propose a new deep network architecture, called IGEV++, for stereo matching. The proposed IGEV++ constructs Multi-range Geometry Encoding Volumes (MGEV), which encode coarse-grained geometry information for ill-posed regions and large disparities, while preserving fine-grained geometry information for details and small disparities. To construct MGEV, we introduce an adaptive patch matching module that efficiently and effectively computes matching costs for large disparity ranges and/or ill-posed regions. We further propose a selective geometry feature fusion module to adaptively fuse multi-range and multi-granularity geometry features in MGEV. Then, we input the fused geometry features into ConvGRUs to iteratively update the disparity map. MGEV allows to efficiently handle large disparities and ill-posed regions, such as occlusions and textureless regions, and enjoys rapid convergence during iterations. Our IGEV++ achieves the best performance on the Scene Flow test set across all disparity ranges, up to 768px. Our IGEV++ also achieves state-of-the-art accuracy on the Middlebury, ETH3D, KITTI 2012, and 2015 benchmarks. Specifically, IGEV++ achieves a 3.23% 2-pixel outlier rate (Bad 2.0) on the large disparity benchmark, Middlebury, representing error reductions of 31.9% and 54.8% compared to RAFT-Stereo and GMStereo, respectively. We also present a real-time version of IGEV++ that achieves the best performance among all published real-time methods on the KITTI benchmarks.

查看原文本刊更多论文

igev++：用于立体匹配的迭代多距离几何编码体

立体匹配是许多计算机视觉和机器人系统的核心组成部分。尽管在过去十年中取得了重大进展，但在条件不佳的地区和巨大的差距中处理匹配模糊仍然是一个公开的挑战。本文提出了一种新的用于立体匹配的深度网络结构igev++。提出的igev++构建了多距离几何编码体（MGEV），对病态区域和大视差进行粗粒度几何信息编码，对细节和小视差保留细粒度几何信息。为了构建MGEV，我们引入了一个自适应补丁匹配模块，该模块可以高效地计算大视差范围和/或病态区域的匹配成本。我们进一步提出了一种选择性几何特征融合模块，用于自适应融合多范围、多粒度的几何特征。然后，将融合后的几何特征输入到convgru中迭代更新视差图。MGEV允许有效地处理大的差异和病态区域，如闭塞和无纹理区域，并在迭代期间享受快速收敛。我们的igev++在所有视差范围内实现了场景流测试集的最佳性能，最高可达768px。我们的igev++在Middlebury， ETH3D， KITTI 2012和2015基准上也达到了最先进的精度。具体来说，IGEV++在大视差基准Middlebury上实现了3.23%的2像素异常率（Bad 2.0），与RAFT-Stereo和GMStereo相比，分别减少了31.9%和54.8%的误差。我们还提出了igev++的实时版本，它在KITTI基准测试中实现了所有已发布的实时方法中最佳的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量