{"title":"One-Stage Deep Stereo Network","authors":"Ziming Liu, E. Malis, Philippe Martinet","doi":"10.1109/icassp48485.2024.10446954","DOIUrl":null,"url":null,"abstract":"Stereo matching is one of the low-level visual perception tasks. Currently, two-stage 2D-3D networks and three-stage recurrent networks dominate deep stereo matching. These methods build a cost volume with low-resolution stereo feature maps, which splits the network into a feature net and a matching net. However, the 2D feature map is not uncontrollable, and the low-resolution feature map has lost important matching information. To overcome these problems, we pro-pose the first one-stage 2D-3D deep stereo network, named StereoOne. It has an efficient module that builds a cost volume at image resolution in real-time. The feature extraction and matching are learned in a single 3D network. According to the experiments, the new network can easily surpass the 2D-3D network baseline and it can achieve competitive performance with the state-of-the-art.","PeriodicalId":517764,"journal":{"name":"ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"128 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icassp48485.2024.10446954","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Stereo matching is one of the low-level visual perception tasks. Currently, two-stage 2D-3D networks and three-stage recurrent networks dominate deep stereo matching. These methods build a cost volume with low-resolution stereo feature maps, which splits the network into a feature net and a matching net. However, the 2D feature map is not uncontrollable, and the low-resolution feature map has lost important matching information. To overcome these problems, we pro-pose the first one-stage 2D-3D deep stereo network, named StereoOne. It has an efficient module that builds a cost volume at image resolution in real-time. The feature extraction and matching are learned in a single 3D network. According to the experiments, the new network can easily surpass the 2D-3D network baseline and it can achieve competitive performance with the state-of-the-art.