{"title":"MAF-Stereo: Fast stereo matching through multi-branch attention fusion.","authors":"Lei Jin, Ke Xu","doi":"10.1016/j.isatra.2025.05.038","DOIUrl":null,"url":null,"abstract":"<p><p>With advancements in computer vision, stereo matching has become a critical component in applications such as autonomous driving and 3D reconstruction. Traditional methods for achieving accurate matching often rely on high-resolution image features or deeper network architectures, which substantially compromise inference speed. In contrast, methods designed for faster performance typically simplify network structures, sacrificing accuracy to improve efficiency. Our study identifies a key limitation of these rapid methods: their exclusive reliance on low-resolution features during the feature resolution recovery process, which results in insufficiently informative recovered features. To address this limitation, we propose a novel module, the Multi-branch Attention Fusion (MAF), which leverages shallow features extracted in the early stages of processing to enhance feature resolution recovery during the cost aggregation phase. Additionally, we introduce an improvement to the cost volume generation process by incorporating cosine similarity, which alleviates the issue of weak correlation between left and right image features often encountered in conventional four-dimensional cost volumes. Building upon these contributions, we present MAF-Stereo, a method that achieves an endpoint error (EPE) of 0.57 and an inference speed of 41 ms on the Scene Flow dataset. Comprehensive evaluations on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) 2012 and 2015 datasets further demonstrate that MAF-Stereo outperforms existing fast matching methods in both speed and accuracy, establishing its effectiveness and robustness. The code is available at: https://github.com/LeiJ-USTB/MAF-Stereo/tree/main.</p>","PeriodicalId":94059,"journal":{"name":"ISA transactions","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISA transactions","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.isatra.2025.05.038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With advancements in computer vision, stereo matching has become a critical component in applications such as autonomous driving and 3D reconstruction. Traditional methods for achieving accurate matching often rely on high-resolution image features or deeper network architectures, which substantially compromise inference speed. In contrast, methods designed for faster performance typically simplify network structures, sacrificing accuracy to improve efficiency. Our study identifies a key limitation of these rapid methods: their exclusive reliance on low-resolution features during the feature resolution recovery process, which results in insufficiently informative recovered features. To address this limitation, we propose a novel module, the Multi-branch Attention Fusion (MAF), which leverages shallow features extracted in the early stages of processing to enhance feature resolution recovery during the cost aggregation phase. Additionally, we introduce an improvement to the cost volume generation process by incorporating cosine similarity, which alleviates the issue of weak correlation between left and right image features often encountered in conventional four-dimensional cost volumes. Building upon these contributions, we present MAF-Stereo, a method that achieves an endpoint error (EPE) of 0.57 and an inference speed of 41 ms on the Scene Flow dataset. Comprehensive evaluations on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) 2012 and 2015 datasets further demonstrate that MAF-Stereo outperforms existing fast matching methods in both speed and accuracy, establishing its effectiveness and robustness. The code is available at: https://github.com/LeiJ-USTB/MAF-Stereo/tree/main.