{"title":"Real-time stereo matching with enhanced geometric comprehension through cross-attention integration","authors":"Hosein Hashemi, Yasser Baleghi, Mohamad Reza Hassanzadeh","doi":"10.1016/j.neucom.2025.130069","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate disparity estimation through stereo matching remains a critical challenge, especially for real-time applications. This work introduces a novel and computationally efficient framework that achieves high accuracy and real-time performance in stereo-based disparity estimation. The proposed approach introduces three key innovations. This work proposes a context cross-attention (CCA) module, which enhances the cost volume aggregation process by leveraging localized cross-attention for improved geometric understanding. Guided concatenation volume (GCV) is also implemented, which optimizes feature matching by effectively combining correlation clues with contextual information, reducing computational redundancy while maintaining crucial spatial details. Also, this paper proposes an uncertainty-based refinement (UR) module, which improves accuracy in challenging scenarios by utilizing an uncertainty map, a context feature map, and a geometry feature map to correct errors in challenging areas such as textureless regions and occlusions. Comprehensive experiments on multiple benchmark datasets, including KITTI, Sceneflow, Middlebury, and ETH3D, demonstrate that the proposed model performs better than existing state-of-the-art real-time approaches in accuracy metrics while maintaining comparable computational efficiency. These results establish the framework as a viable solution for demanding real-world applications, particularly in autonomous driving and robotics systems where real-time performance is crucial. The source code is available at <span><span>https://github.com/kayhan-hashemi/CCAStereo</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"636 ","pages":"Article 130069"},"PeriodicalIF":5.5000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225007416","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate disparity estimation through stereo matching remains a critical challenge, especially for real-time applications. This work introduces a novel and computationally efficient framework that achieves high accuracy and real-time performance in stereo-based disparity estimation. The proposed approach introduces three key innovations. This work proposes a context cross-attention (CCA) module, which enhances the cost volume aggregation process by leveraging localized cross-attention for improved geometric understanding. Guided concatenation volume (GCV) is also implemented, which optimizes feature matching by effectively combining correlation clues with contextual information, reducing computational redundancy while maintaining crucial spatial details. Also, this paper proposes an uncertainty-based refinement (UR) module, which improves accuracy in challenging scenarios by utilizing an uncertainty map, a context feature map, and a geometry feature map to correct errors in challenging areas such as textureless regions and occlusions. Comprehensive experiments on multiple benchmark datasets, including KITTI, Sceneflow, Middlebury, and ETH3D, demonstrate that the proposed model performs better than existing state-of-the-art real-time approaches in accuracy metrics while maintaining comparable computational efficiency. These results establish the framework as a viable solution for demanding real-world applications, particularly in autonomous driving and robotics systems where real-time performance is crucial. The source code is available at https://github.com/kayhan-hashemi/CCAStereo.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.