Yun Wang;Kunhong Li;Longguang Wang;Junjie Hu;Dapeng Oliver Wu;Yulan Guo
{"title":"ADStereo: Efficient Stereo Matching With Adaptive Downsampling and Disparity Alignment","authors":"Yun Wang;Kunhong Li;Longguang Wang;Junjie Hu;Dapeng Oliver Wu;Yulan Guo","doi":"10.1109/TIP.2025.3540282","DOIUrl":null,"url":null,"abstract":"The balance between accuracy and computational efficiency is crucial for the applications of deep learning-based stereo matching algorithms in real-world scenarios. Since matching cost aggregation is usually the most computationally expensive component, a common practice is to construct cost volumes at a low resolution for aggregation and then directly regress a high-resolution disparity map. However, current solutions often suffer from limitations such as the loss of discriminative features caused by downsampling operations that treat all pixels equally, and spatial misalignment resulting from repeated downsampling and upsampling. To overcome these challenges, this paper presents two sampling strategies: the Adaptive Downsampling Module (ADM) and the Disparity Alignment Module (DAM), to prioritize real-time inference while ensuring accuracy. The ADM leverages local features to learn adaptive weights, enabling more effective downsampling while preserving crucial structure information. On the other hand, the DAM employs a learnable interpolation strategy to predict transformation offsets of pixels, thereby mitigating the spatial misalignment issue. Building upon these modules, we introduce ADStereo, a real-time yet accurate network that achieves highly competitive performance on multiple public benchmarks. Specifically, our ADStereo runs over <inline-formula> <tex-math>$5\\times $ </tex-math></inline-formula> faster than the current state-of-the-art CREStereo (0.054s vs. <inline-formula> <tex-math>$0.29{s}$ </tex-math></inline-formula>) under the same hardware while achieving comparable accuracy (1.82% vs. 1.69%) on the KITTI stereo 2015 benchmark. The codes are available at: <uri>https://github.com/cocowy1/ADStereo</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1204-1218"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10890914/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The balance between accuracy and computational efficiency is crucial for the applications of deep learning-based stereo matching algorithms in real-world scenarios. Since matching cost aggregation is usually the most computationally expensive component, a common practice is to construct cost volumes at a low resolution for aggregation and then directly regress a high-resolution disparity map. However, current solutions often suffer from limitations such as the loss of discriminative features caused by downsampling operations that treat all pixels equally, and spatial misalignment resulting from repeated downsampling and upsampling. To overcome these challenges, this paper presents two sampling strategies: the Adaptive Downsampling Module (ADM) and the Disparity Alignment Module (DAM), to prioritize real-time inference while ensuring accuracy. The ADM leverages local features to learn adaptive weights, enabling more effective downsampling while preserving crucial structure information. On the other hand, the DAM employs a learnable interpolation strategy to predict transformation offsets of pixels, thereby mitigating the spatial misalignment issue. Building upon these modules, we introduce ADStereo, a real-time yet accurate network that achieves highly competitive performance on multiple public benchmarks. Specifically, our ADStereo runs over $5\times $ faster than the current state-of-the-art CREStereo (0.054s vs. $0.29{s}$ ) under the same hardware while achieving comparable accuracy (1.82% vs. 1.69%) on the KITTI stereo 2015 benchmark. The codes are available at: https://github.com/cocowy1/ADStereo.