G2L-Stereo: Global to Local Two-Stage Real-Time Stereo Matching Network

IF 4.8 2区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Computational Imaging Pub Date : 2025-06-19 DOI:10.1109/TCI.2025.3581105

Jie Tang;Gaofeng Peng;Jialu Liu;Bo Yu

{"title":"G2L-Stereo: Global to Local Two-Stage Real-Time Stereo Matching Network","authors":"Jie Tang;Gaofeng Peng;Jialu Liu;Bo Yu","doi":"10.1109/TCI.2025.3581105","DOIUrl":null,"url":null,"abstract":"Developing fast and accurate stereo matching algorithms is crucial for real-world embedded vision applications. Depth information plays a significant role in scene understanding, and depth calculated through stereo matching is generally considered to be more precise and reliable than that obtained from monocular depth estimation. However, speed-oriented stereo matching methods often suffer from poor feature representation due to sparse sampling and detail loss caused by unreasonable disparity allocation during upsampling. To address these issues, we propose G2L-Stereo, a two-stage real-time stereo matching network that combines global disparity range prediction and local disparity range prediction. In the global disparity range prediction stage, we introduce feature-guided connections for cost aggregation, enhancing the expressive power of sparse features by aligning the feature space across different scales of cost volumes. We also incorporate confidence estimation into the upsampling algorithm to reduce the propagation of inaccurate disparities during upsampling, yielding more precise disparity maps. In the local disparity range prediction stage, we develop a disparity refinement module guided by neighborhood similarity. This module aggregates similar neighboring costs to estimate disparity residuals and refine disparities, restoring lost details in the low-resolution disparity map and further enhancing disparity accuracy. Extensive experiments on the SceneFlow and KITTI datasets validate the effectiveness of our model, showing that G2L-Stereo achieves fast inference while maintaining accuracy comparable to state-of-the-art methods.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"11 ","pages":"852-863"},"PeriodicalIF":4.8000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Imaging","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11045114/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Developing fast and accurate stereo matching algorithms is crucial for real-world embedded vision applications. Depth information plays a significant role in scene understanding, and depth calculated through stereo matching is generally considered to be more precise and reliable than that obtained from monocular depth estimation. However, speed-oriented stereo matching methods often suffer from poor feature representation due to sparse sampling and detail loss caused by unreasonable disparity allocation during upsampling. To address these issues, we propose G2L-Stereo, a two-stage real-time stereo matching network that combines global disparity range prediction and local disparity range prediction. In the global disparity range prediction stage, we introduce feature-guided connections for cost aggregation, enhancing the expressive power of sparse features by aligning the feature space across different scales of cost volumes. We also incorporate confidence estimation into the upsampling algorithm to reduce the propagation of inaccurate disparities during upsampling, yielding more precise disparity maps. In the local disparity range prediction stage, we develop a disparity refinement module guided by neighborhood similarity. This module aggregates similar neighboring costs to estimate disparity residuals and refine disparities, restoring lost details in the low-resolution disparity map and further enhancing disparity accuracy. Extensive experiments on the SceneFlow and KITTI datasets validate the effectiveness of our model, showing that G2L-Stereo achieves fast inference while maintaining accuracy comparable to state-of-the-art methods.

查看原文本刊更多论文

G2L-Stereo：全局到局部两阶段实时立体匹配网络

开发快速准确的立体匹配算法对于现实世界的嵌入式视觉应用至关重要。深度信息在场景理解中起着重要的作用，通过立体匹配计算的深度通常被认为比单目深度估计获得的深度更精确和可靠。然而，以速度为导向的立体匹配方法往往由于采样稀疏和上采样过程中视差分配不合理导致的细节丢失而导致特征表达不佳。为了解决这些问题，我们提出了G2L-Stereo，一种结合全局视差范围预测和局部视差范围预测的两阶段实时立体匹配网络。在全局视差范围预测阶段，我们引入特征引导连接进行成本聚合，通过在不同尺度的成本体积上对齐特征空间来增强稀疏特征的表达能力。我们还将置信度估计纳入上采样算法中，以减少上采样过程中不准确的差异传播，从而产生更精确的视差图。在局部视差范围预测阶段，我们开发了一个以邻域相似度为导向的视差细化模块。该模块聚合相似的相邻代价来估计视差残差并细化视差，恢复低分辨率视差图中丢失的细节，进一步提高视差精度。在SceneFlow和KITTI数据集上进行的大量实验验证了我们模型的有效性，表明G2L-Stereo在保持与最先进方法相当的精度的同时实现了快速推理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Computational Imaging Mathematics-Computational Mathematics

CiteScore

8.20

自引率

7.40%

发文量

期刊介绍： The IEEE Transactions on Computational Imaging will publish articles where computation plays an integral role in the image formation process. Papers will cover all areas of computational imaging ranging from fundamental theoretical methods to the latest innovative computational imaging system designs. Topics of interest will include advanced algorithms and mathematical techniques, model-based data inversion, methods for image and signal recovery from sparse and incomplete data, techniques for non-traditional sensing of image data, methods for dynamic information acquisition and extraction from imaging sensors, software and hardware for efficient computation in imaging systems, and highly novel imaging system design.