用于单目深度估计的三维点云和变压器网络

2022 10th International Conference on Information and Education Technology (ICIET) Pub Date : 2022-04-09 DOI:10.1109/ICIET55102.2022.9779008

Yu Hong, Xiaolong Liu, H. Dai, Wenqi Tao

{"title":"用于单目深度估计的三维点云和变压器网络","authors":"Yu Hong, Xiaolong Liu, H. Dai, Wenqi Tao","doi":"10.1109/ICIET55102.2022.9779008","DOIUrl":null,"url":null,"abstract":"Estimating dense depth map from one image is a challenging task for computer vision. Because the same image can correspond to the infinite variety of 3D spaces. Neural networks have gradually achieved reasonable results on this task with the continuous development of deep learning. But the depth estimation method based on monocular cameras still has a gap in accuracy compared with multi-view or sensor-based methods. Thus, this paper proposes to supplement a limited number of sparse 3D point clouds combined with transformer processing to increase the accuracy of the monocular depth estimation model. The sparse 3D point clouds are used as supplementary geometric information and the 3D point clouds are input into the network with the RGB image. After five times integration, the multi-scale features are extracted, and then the swin transformer block is used to process the output feature map of the main network, further improving the accuracy. Experiments demonstrate that our network achieves better results than the best method on the current most commonly used dataset for monocular depth estimation, NYU Depth V2. However, the qualitative results are also better than the best method.","PeriodicalId":371262,"journal":{"name":"2022 10th International Conference on Information and Education Technology (ICIET)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PCTNet: 3D Point Cloud and Transformer Network for Monocular Depth Estimation\",\"authors\":\"Yu Hong, Xiaolong Liu, H. Dai, Wenqi Tao\",\"doi\":\"10.1109/ICIET55102.2022.9779008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Estimating dense depth map from one image is a challenging task for computer vision. Because the same image can correspond to the infinite variety of 3D spaces. Neural networks have gradually achieved reasonable results on this task with the continuous development of deep learning. But the depth estimation method based on monocular cameras still has a gap in accuracy compared with multi-view or sensor-based methods. Thus, this paper proposes to supplement a limited number of sparse 3D point clouds combined with transformer processing to increase the accuracy of the monocular depth estimation model. The sparse 3D point clouds are used as supplementary geometric information and the 3D point clouds are input into the network with the RGB image. After five times integration, the multi-scale features are extracted, and then the swin transformer block is used to process the output feature map of the main network, further improving the accuracy. Experiments demonstrate that our network achieves better results than the best method on the current most commonly used dataset for monocular depth estimation, NYU Depth V2. However, the qualitative results are also better than the best method.\",\"PeriodicalId\":371262,\"journal\":{\"name\":\"2022 10th International Conference on Information and Education Technology (ICIET)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 10th International Conference on Information and Education Technology (ICIET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIET55102.2022.9779008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 10th International Conference on Information and Education Technology (ICIET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIET55102.2022.9779008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

从一幅图像中估计密集深度图是计算机视觉的一个具有挑战性的任务。因为同一张图像可以对应无限多种三维空间。随着深度学习的不断发展，神经网络在这一任务上逐渐取得了合理的结果。但是基于单目摄像机的深度估计方法与基于多视点或传感器的深度估计方法相比，在精度上还有一定的差距。因此，本文提出补充有限数量的稀疏三维点云并结合变压器处理来提高单目深度估计模型的精度。利用稀疏的三维点云作为补充几何信息，将三维点云与RGB图像一起输入到网络中。经过5次积分提取多尺度特征，再利用旋转变压器块对主网输出特征图进行处理，进一步提高了精度。实验表明，我们的网络在当前最常用的单目深度估计数据集NYU depth V2上取得了比最佳方法更好的结果。但定性结果也优于最佳方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PCTNet: 3D Point Cloud and Transformer Network for Monocular Depth Estimation

Estimating dense depth map from one image is a challenging task for computer vision. Because the same image can correspond to the infinite variety of 3D spaces. Neural networks have gradually achieved reasonable results on this task with the continuous development of deep learning. But the depth estimation method based on monocular cameras still has a gap in accuracy compared with multi-view or sensor-based methods. Thus, this paper proposes to supplement a limited number of sparse 3D point clouds combined with transformer processing to increase the accuracy of the monocular depth estimation model. The sparse 3D point clouds are used as supplementary geometric information and the 3D point clouds are input into the network with the RGB image. After five times integration, the multi-scale features are extracted, and then the swin transformer block is used to process the output feature map of the main network, further improving the accuracy. Experiments demonstrate that our network achieves better results than the best method on the current most commonly used dataset for monocular depth estimation, NYU Depth V2. However, the qualitative results are also better than the best method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 10th International Conference on Information and Education Technology (ICIET)

自引率

0.00%

发文量