Binocular-Separated Modeling for Efficient Binocular Stereo Matching

IF 7.9 1区工程技术 Q1 ENGINEERING, CIVIL

IEEE Transactions on Intelligent Transportation Systems Pub Date : 2025-02-04 DOI:10.1109/TITS.2025.3531115

Yeping Peng;Jianrui Xu;Guangzhong Cao;Runhao Zeng

{"title":"Binocular-Separated Modeling for Efficient Binocular Stereo Matching","authors":"Yeping Peng;Jianrui Xu;Guangzhong Cao;Runhao Zeng","doi":"10.1109/TITS.2025.3531115","DOIUrl":null,"url":null,"abstract":"Binocular stereo matching is a crucial task in autonomous driving for accurately estimating the depth information of objects and scenes. This task, however, is challenging due to various ill-posed regions within binocular image pairs, such as repeated textures and weak textures which present complex correspondences between the points. Existing methods extract features from binocular input images mainly by relying on deep convolutional neural networks with a substantial number of convolutional layers, which may incur high memory and computation costs, thus making it hard to deploy in real-world applications. Additionally, previous methods do not consider the correlation between view unary features during the construction of the cost volume, thus leading to inferior results. To address these issues, a novel lightweight binocular-separated feature extraction module is proposed that includes a view-shared multi-dilation fusion module and a view-specific feature extractor. Our method leverages a shallow neural network with a multi-dilation modeling module to provide similar receptive fields as deep neural networks but with fewer parameters and better computational efficiency. Furthermore, we propose incorporating the correlations of view-shared features to dynamically select view-specific features during the construction of the cost volume. Extensive experiments conducted on two public benchmark datasets show that our proposed method outperforms the deep model-based baseline method (i.e., 13.6% improvement on Scene Flow and 2.0% on KITTI 2015) while using 29.7% fewer parameters. Ablation experiments show that our method achieves superior matching performance in weak texture and edge regions. The source code will be made publicly available.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"26 3","pages":"3028-3038"},"PeriodicalIF":7.9000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Transportation Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10870872/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}

引用次数: 0

Abstract

Binocular stereo matching is a crucial task in autonomous driving for accurately estimating the depth information of objects and scenes. This task, however, is challenging due to various ill-posed regions within binocular image pairs, such as repeated textures and weak textures which present complex correspondences between the points. Existing methods extract features from binocular input images mainly by relying on deep convolutional neural networks with a substantial number of convolutional layers, which may incur high memory and computation costs, thus making it hard to deploy in real-world applications. Additionally, previous methods do not consider the correlation between view unary features during the construction of the cost volume, thus leading to inferior results. To address these issues, a novel lightweight binocular-separated feature extraction module is proposed that includes a view-shared multi-dilation fusion module and a view-specific feature extractor. Our method leverages a shallow neural network with a multi-dilation modeling module to provide similar receptive fields as deep neural networks but with fewer parameters and better computational efficiency. Furthermore, we propose incorporating the correlations of view-shared features to dynamically select view-specific features during the construction of the cost volume. Extensive experiments conducted on two public benchmark datasets show that our proposed method outperforms the deep model-based baseline method (i.e., 13.6% improvement on Scene Flow and 2.0% on KITTI 2015) while using 29.7% fewer parameters. Ablation experiments show that our method achieves superior matching performance in weak texture and edge regions. The source code will be made publicly available.

查看原文本刊更多论文

高效双目立体匹配的双目分离建模

双目立体匹配是自动驾驶中准确估计物体和场景深度信息的关键任务。然而，由于双目图像对中存在各种病态区域，例如重复纹理和弱纹理，这些区域在点之间呈现复杂的对应关系，因此这项任务具有挑战性。现有的方法主要依靠具有大量卷积层的深度卷积神经网络从双眼输入图像中提取特征，这可能会产生较高的内存和计算成本，因此难以在实际应用中部署。此外，以往的方法在构建成本体积时没有考虑视图一元特征之间的相关性，导致结果较差。为了解决这些问题，提出了一种新的轻量级双筒分离特征提取模块，该模块包括一个视图共享的多扩张融合模块和一个视图特定的特征提取器。我们的方法利用具有多重膨胀建模模块的浅神经网络来提供与深度神经网络相似的接受场，但参数更少，计算效率更高。此外，我们建议在构建成本体积时结合视图共享特征的相关性来动态选择特定于视图的特征。在两个公共基准数据集上进行的大量实验表明，我们提出的方法优于基于深度模型的基线方法（即在场景流上提高13.6%，在KITTI 2015上提高2.0%），同时使用的参数减少了29.7%。烧蚀实验表明，该方法在弱纹理和边缘区域具有较好的匹配性能。源代码将公开提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Intelligent Transportation Systems 工程技术-工程：电子与电气

CiteScore

14.80

自引率

12.90%

发文量

1872

审稿时长

7.5 months

期刊介绍： The theoretical, experimental and operational aspects of electrical and electronics engineering and information technologies as applied to Intelligent Transportation Systems (ITS). Intelligent Transportation Systems are defined as those systems utilizing synergistic technologies and systems engineering concepts to develop and improve transportation systems of all kinds. The scope of this interdisciplinary activity includes the promotion, consolidation and coordination of ITS technical activities among IEEE entities, and providing a focus for cooperative activities, both internally and externally.