{"title":"GLDC: combining global and local consistency of multibranch depth completion","authors":"Yaping Deng, Yingjiang Li, Zibo Wei, Keying Li","doi":"10.1007/s00371-024-03609-7","DOIUrl":null,"url":null,"abstract":"<p>Depth completion aims to generate dense depth maps from sparse depth maps and corresponding RGB images. In this task, the locality based on the convolutional layer poses challenges for the network in obtaining global information. While the Transformer-based architecture performs well in capturing global information, it may lead to the loss of local detail features. Consequently, improving the simultaneous attention to global and local information is crucial for achieving effective depth completion. This paper proposes a novel and effective dual-encoder–three-decoder network, consisting of local and global branches. Specifically, the local branch uses a convolutional network, and the global branch utilizes a Transformer network to extract rich features. Meanwhile, the local branch is dominated by color image and the global branch is dominated by depth map to thoroughly integrate and utilize multimodal information. In addition, a gate fusion mechanism is used in the decoder stage to fuse local and global information, to achieving high-performance depth completion. This hybrid architecture is conducive to the effective fusion of local detail information and contextual information. Experimental results demonstrated the superiority of our method over other advanced methods on KITTI Depth Completion and NYU v2 datasets.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03609-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Depth completion aims to generate dense depth maps from sparse depth maps and corresponding RGB images. In this task, the locality based on the convolutional layer poses challenges for the network in obtaining global information. While the Transformer-based architecture performs well in capturing global information, it may lead to the loss of local detail features. Consequently, improving the simultaneous attention to global and local information is crucial for achieving effective depth completion. This paper proposes a novel and effective dual-encoder–three-decoder network, consisting of local and global branches. Specifically, the local branch uses a convolutional network, and the global branch utilizes a Transformer network to extract rich features. Meanwhile, the local branch is dominated by color image and the global branch is dominated by depth map to thoroughly integrate and utilize multimodal information. In addition, a gate fusion mechanism is used in the decoder stage to fuse local and global information, to achieving high-performance depth completion. This hybrid architecture is conducive to the effective fusion of local detail information and contextual information. Experimental results demonstrated the superiority of our method over other advanced methods on KITTI Depth Completion and NYU v2 datasets.