深度引导的三轴融合网络，用于高效的广义立体匹配

IF 5.3 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters Pub Date : 2025-09-04 DOI:10.1109/LRA.2025.3606382

Seunghun Moon;Haeuk Lee;Suk-Ju Kang

{"title":"深度引导的三轴融合网络，用于高效的广义立体匹配","authors":"Seunghun Moon;Haeuk Lee;Suk-Ju Kang","doi":"10.1109/LRA.2025.3606382","DOIUrl":null,"url":null,"abstract":"Stereo matching is a crucial task in computer vision that estimates pixel-level disparities from rectified image pairs to reconstruct three-dimensional depth information. It has diverse applications, ranging from augmented reality to autonomous driving. While deep learning-based methods have achieved remarkable progress through 3D CNNs and Transformer-based architectures, their reliance on domain-specific fine-tuning and localized feature extraction often hampers robustness and generalization in real-world scenarios. This letter introduces the Depth-Guided Tri-Axial Fusion Network (DGTFNet), which overcomes these limitations by integrating depth priors from a monocular depth foundation model via the Depth-Guided Cross-Modal Attention (DGCMA) module. Additionally, we propose a Tri-Axial Attention (TAA) module that employs directional strip convolutions to capture long-range dependencies across horizontal, vertical, and spatial dimensions. Extensive evaluations on public stereo benchmarks demonstrate that DGTFNet significantly outperforms state-of-the-art methods in zero-shot evaluations. Ablation studies further validate the contribution of each module in delivering robust and efficient stereo matching.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 10","pages":"10791-10798"},"PeriodicalIF":5.3000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DGTFNet: Depth-Guided Tri-Axial Fusion Network for Efficient Generalizable Stereo Matching\",\"authors\":\"Seunghun Moon;Haeuk Lee;Suk-Ju Kang\",\"doi\":\"10.1109/LRA.2025.3606382\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stereo matching is a crucial task in computer vision that estimates pixel-level disparities from rectified image pairs to reconstruct three-dimensional depth information. It has diverse applications, ranging from augmented reality to autonomous driving. While deep learning-based methods have achieved remarkable progress through 3D CNNs and Transformer-based architectures, their reliance on domain-specific fine-tuning and localized feature extraction often hampers robustness and generalization in real-world scenarios. This letter introduces the Depth-Guided Tri-Axial Fusion Network (DGTFNet), which overcomes these limitations by integrating depth priors from a monocular depth foundation model via the Depth-Guided Cross-Modal Attention (DGCMA) module. Additionally, we propose a Tri-Axial Attention (TAA) module that employs directional strip convolutions to capture long-range dependencies across horizontal, vertical, and spatial dimensions. Extensive evaluations on public stereo benchmarks demonstrate that DGTFNet significantly outperforms state-of-the-art methods in zero-shot evaluations. Ablation studies further validate the contribution of each module in delivering robust and efficient stereo matching.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"10 10\",\"pages\":\"10791-10798\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11150692/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11150692/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

摘要

立体匹配是计算机视觉中的一项重要任务，它从校正后的图像对中估计像素级差，从而重建三维深度信息。它有多种应用，从增强现实到自动驾驶。虽然基于深度学习的方法已经通过3D cnn和基于transformer的架构取得了显著的进展，但它们对特定领域微调和局部特征提取的依赖往往会阻碍现实场景中的鲁棒性和泛化。本文介绍了深度引导三轴融合网络（DGTFNet），该网络通过深度引导跨模态注意（DGCMA）模块整合来自单目深度基础模型的深度先验，克服了这些限制。此外，我们提出了一个三轴注意力（TAA）模块，该模块采用定向条卷积来捕获跨越水平、垂直和空间维度的远程依赖关系。对公共立体基准的广泛评估表明，DGTFNet在零射击评估中明显优于最先进的方法。消融研究进一步验证了每个模块在提供鲁棒和高效立体匹配方面的贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DGTFNet: Depth-Guided Tri-Axial Fusion Network for Efficient Generalizable Stereo Matching

Stereo matching is a crucial task in computer vision that estimates pixel-level disparities from rectified image pairs to reconstruct three-dimensional depth information. It has diverse applications, ranging from augmented reality to autonomous driving. While deep learning-based methods have achieved remarkable progress through 3D CNNs and Transformer-based architectures, their reliance on domain-specific fine-tuning and localized feature extraction often hampers robustness and generalization in real-world scenarios. This letter introduces the Depth-Guided Tri-Axial Fusion Network (DGTFNet), which overcomes these limitations by integrating depth priors from a monocular depth foundation model via the Depth-Guided Cross-Modal Attention (DGCMA) module. Additionally, we propose a Tri-Axial Attention (TAA) module that employs directional strip convolutions to capture long-range dependencies across horizontal, vertical, and spatial dimensions. Extensive evaluations on public stereo benchmarks demonstrate that DGTFNet significantly outperforms state-of-the-art methods in zero-shot evaluations. Ablation studies further validate the contribution of each module in delivering robust and efficient stereo matching.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.