{"title":"G2L-Stereo: Global to Local Two-Stage Real-Time Stereo Matching Network","authors":"Jie Tang;Gaofeng Peng;Jialu Liu;Bo Yu","doi":"10.1109/TCI.2025.3581105","DOIUrl":null,"url":null,"abstract":"Developing fast and accurate stereo matching algorithms is crucial for real-world embedded vision applications. Depth information plays a significant role in scene understanding, and depth calculated through stereo matching is generally considered to be more precise and reliable than that obtained from monocular depth estimation. However, speed-oriented stereo matching methods often suffer from poor feature representation due to sparse sampling and detail loss caused by unreasonable disparity allocation during upsampling. To address these issues, we propose G2L-Stereo, a two-stage real-time stereo matching network that combines global disparity range prediction and local disparity range prediction. In the global disparity range prediction stage, we introduce feature-guided connections for cost aggregation, enhancing the expressive power of sparse features by aligning the feature space across different scales of cost volumes. We also incorporate confidence estimation into the upsampling algorithm to reduce the propagation of inaccurate disparities during upsampling, yielding more precise disparity maps. In the local disparity range prediction stage, we develop a disparity refinement module guided by neighborhood similarity. This module aggregates similar neighboring costs to estimate disparity residuals and refine disparities, restoring lost details in the low-resolution disparity map and further enhancing disparity accuracy. Extensive experiments on the SceneFlow and KITTI datasets validate the effectiveness of our model, showing that G2L-Stereo achieves fast inference while maintaining accuracy comparable to state-of-the-art methods.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"11 ","pages":"852-863"},"PeriodicalIF":4.8000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Imaging","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11045114/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Developing fast and accurate stereo matching algorithms is crucial for real-world embedded vision applications. Depth information plays a significant role in scene understanding, and depth calculated through stereo matching is generally considered to be more precise and reliable than that obtained from monocular depth estimation. However, speed-oriented stereo matching methods often suffer from poor feature representation due to sparse sampling and detail loss caused by unreasonable disparity allocation during upsampling. To address these issues, we propose G2L-Stereo, a two-stage real-time stereo matching network that combines global disparity range prediction and local disparity range prediction. In the global disparity range prediction stage, we introduce feature-guided connections for cost aggregation, enhancing the expressive power of sparse features by aligning the feature space across different scales of cost volumes. We also incorporate confidence estimation into the upsampling algorithm to reduce the propagation of inaccurate disparities during upsampling, yielding more precise disparity maps. In the local disparity range prediction stage, we develop a disparity refinement module guided by neighborhood similarity. This module aggregates similar neighboring costs to estimate disparity residuals and refine disparities, restoring lost details in the low-resolution disparity map and further enhancing disparity accuracy. Extensive experiments on the SceneFlow and KITTI datasets validate the effectiveness of our model, showing that G2L-Stereo achieves fast inference while maintaining accuracy comparable to state-of-the-art methods.
期刊介绍:
The IEEE Transactions on Computational Imaging will publish articles where computation plays an integral role in the image formation process. Papers will cover all areas of computational imaging ranging from fundamental theoretical methods to the latest innovative computational imaging system designs. Topics of interest will include advanced algorithms and mathematical techniques, model-based data inversion, methods for image and signal recovery from sparse and incomplete data, techniques for non-traditional sensing of image data, methods for dynamic information acquisition and extraction from imaging sensors, software and hardware for efficient computation in imaging systems, and highly novel imaging system design.