MAF-Stereo: Fast stereo matching through multi-branch attention fusion.

Lei Jin, Ke Xu
{"title":"MAF-Stereo: Fast stereo matching through multi-branch attention fusion.","authors":"Lei Jin, Ke Xu","doi":"10.1016/j.isatra.2025.05.038","DOIUrl":null,"url":null,"abstract":"<p><p>With advancements in computer vision, stereo matching has become a critical component in applications such as autonomous driving and 3D reconstruction. Traditional methods for achieving accurate matching often rely on high-resolution image features or deeper network architectures, which substantially compromise inference speed. In contrast, methods designed for faster performance typically simplify network structures, sacrificing accuracy to improve efficiency. Our study identifies a key limitation of these rapid methods: their exclusive reliance on low-resolution features during the feature resolution recovery process, which results in insufficiently informative recovered features. To address this limitation, we propose a novel module, the Multi-branch Attention Fusion (MAF), which leverages shallow features extracted in the early stages of processing to enhance feature resolution recovery during the cost aggregation phase. Additionally, we introduce an improvement to the cost volume generation process by incorporating cosine similarity, which alleviates the issue of weak correlation between left and right image features often encountered in conventional four-dimensional cost volumes. Building upon these contributions, we present MAF-Stereo, a method that achieves an endpoint error (EPE) of 0.57 and an inference speed of 41 ms on the Scene Flow dataset. Comprehensive evaluations on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) 2012 and 2015 datasets further demonstrate that MAF-Stereo outperforms existing fast matching methods in both speed and accuracy, establishing its effectiveness and robustness. The code is available at: https://github.com/LeiJ-USTB/MAF-Stereo/tree/main.</p>","PeriodicalId":94059,"journal":{"name":"ISA transactions","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISA transactions","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.isatra.2025.05.038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With advancements in computer vision, stereo matching has become a critical component in applications such as autonomous driving and 3D reconstruction. Traditional methods for achieving accurate matching often rely on high-resolution image features or deeper network architectures, which substantially compromise inference speed. In contrast, methods designed for faster performance typically simplify network structures, sacrificing accuracy to improve efficiency. Our study identifies a key limitation of these rapid methods: their exclusive reliance on low-resolution features during the feature resolution recovery process, which results in insufficiently informative recovered features. To address this limitation, we propose a novel module, the Multi-branch Attention Fusion (MAF), which leverages shallow features extracted in the early stages of processing to enhance feature resolution recovery during the cost aggregation phase. Additionally, we introduce an improvement to the cost volume generation process by incorporating cosine similarity, which alleviates the issue of weak correlation between left and right image features often encountered in conventional four-dimensional cost volumes. Building upon these contributions, we present MAF-Stereo, a method that achieves an endpoint error (EPE) of 0.57 and an inference speed of 41 ms on the Scene Flow dataset. Comprehensive evaluations on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) 2012 and 2015 datasets further demonstrate that MAF-Stereo outperforms existing fast matching methods in both speed and accuracy, establishing its effectiveness and robustness. The code is available at: https://github.com/LeiJ-USTB/MAF-Stereo/tree/main.

MAF-Stereo:通过多分支注意融合实现快速立体匹配。
随着计算机视觉技术的进步,立体匹配已经成为自动驾驶和3D重建等应用的关键组成部分。实现精确匹配的传统方法通常依赖于高分辨率图像特征或更深层次的网络架构,这大大降低了推理速度。相比之下,为更快的性能而设计的方法通常会简化网络结构,牺牲准确性来提高效率。我们的研究发现了这些快速方法的一个关键限制:它们在特征分辨率恢复过程中完全依赖于低分辨率特征,这导致恢复的特征信息不足。为了解决这一限制,我们提出了一个新的模块,即多分支注意力融合(MAF),它利用在处理的早期阶段提取的浅层特征来增强在成本聚合阶段的特征分辨率恢复。此外,我们通过引入余弦相似度来改进成本体积生成过程,这缓解了传统四维成本体积中经常遇到的左右图像特征之间弱相关性的问题。在这些贡献的基础上,我们提出了MAF-Stereo,一种在场景流数据集上实现端点误差(EPE)为0.57和推理速度为41 ms的方法。对卡尔斯鲁厄理工学院和丰田理工学院(KITTI) 2012年和2015年数据集的综合评价进一步表明,MAF-Stereo在速度和精度上都优于现有的快速匹配方法,建立了其有效性和鲁棒性。代码可从https://github.com/LeiJ-USTB/MAF-Stereo/tree/main获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信