{"title":"DGTFNet: Depth-Guided Tri-Axial Fusion Network for Efficient Generalizable Stereo Matching","authors":"Seunghun Moon;Haeuk Lee;Suk-Ju Kang","doi":"10.1109/LRA.2025.3606382","DOIUrl":null,"url":null,"abstract":"Stereo matching is a crucial task in computer vision that estimates pixel-level disparities from rectified image pairs to reconstruct three-dimensional depth information. It has diverse applications, ranging from augmented reality to autonomous driving. While deep learning-based methods have achieved remarkable progress through 3D CNNs and Transformer-based architectures, their reliance on domain-specific fine-tuning and localized feature extraction often hampers robustness and generalization in real-world scenarios. This letter introduces the Depth-Guided Tri-Axial Fusion Network (DGTFNet), which overcomes these limitations by integrating depth priors from a monocular depth foundation model via the Depth-Guided Cross-Modal Attention (DGCMA) module. Additionally, we propose a Tri-Axial Attention (TAA) module that employs directional strip convolutions to capture long-range dependencies across horizontal, vertical, and spatial dimensions. Extensive evaluations on public stereo benchmarks demonstrate that DGTFNet significantly outperforms state-of-the-art methods in zero-shot evaluations. Ablation studies further validate the contribution of each module in delivering robust and efficient stereo matching.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 10","pages":"10791-10798"},"PeriodicalIF":5.3000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11150692/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Stereo matching is a crucial task in computer vision that estimates pixel-level disparities from rectified image pairs to reconstruct three-dimensional depth information. It has diverse applications, ranging from augmented reality to autonomous driving. While deep learning-based methods have achieved remarkable progress through 3D CNNs and Transformer-based architectures, their reliance on domain-specific fine-tuning and localized feature extraction often hampers robustness and generalization in real-world scenarios. This letter introduces the Depth-Guided Tri-Axial Fusion Network (DGTFNet), which overcomes these limitations by integrating depth priors from a monocular depth foundation model via the Depth-Guided Cross-Modal Attention (DGCMA) module. Additionally, we propose a Tri-Axial Attention (TAA) module that employs directional strip convolutions to capture long-range dependencies across horizontal, vertical, and spatial dimensions. Extensive evaluations on public stereo benchmarks demonstrate that DGTFNet significantly outperforms state-of-the-art methods in zero-shot evaluations. Ablation studies further validate the contribution of each module in delivering robust and efficient stereo matching.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.