CenterFormer:用于3D对象检测的基于中心的变压器

Zixiang Zhou, Xian Zhao, Yu Wang, Panqu Wang, H. Foroosh
{"title":"CenterFormer:用于3D对象检测的基于中心的变压器","authors":"Zixiang Zhou, Xian Zhao, Yu Wang, Panqu Wang, H. Foroosh","doi":"10.48550/arXiv.2209.05588","DOIUrl":null,"url":null,"abstract":"Query-based transformer has shown great potential in constructing long-range attention in many image-domain tasks, but has rarely been considered in LiDAR-based 3D object detection due to the overwhelming size of the point cloud data. In this paper, we propose CenterFormer, a center-based transformer network for 3D object detection. CenterFormer first uses a center heatmap to select center candidates on top of a standard voxel-based point cloud encoder. It then uses the feature of the center candidate as the query embedding in the transformer. To further aggregate features from multiple frames, we design an approach to fuse features through cross-attention. Lastly, regression heads are added to predict the bounding box on the output center feature representation. Our design reduces the convergence difficulty and computational complexity of the transformer structure. The results show significant improvements over the strong baseline of anchor-free object detection networks. CenterFormer achieves state-of-the-art performance for a single model on the Waymo Open Dataset, with 73.7% mAPH on the validation set and 75.6% mAPH on the test set, significantly outperforming all previously published CNN and transformer-based methods. Our code is publicly available at https://github.com/TuSimple/centerformer","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"139 1","pages":"496-513"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":"{\"title\":\"CenterFormer: Center-based Transformer for 3D Object Detection\",\"authors\":\"Zixiang Zhou, Xian Zhao, Yu Wang, Panqu Wang, H. Foroosh\",\"doi\":\"10.48550/arXiv.2209.05588\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Query-based transformer has shown great potential in constructing long-range attention in many image-domain tasks, but has rarely been considered in LiDAR-based 3D object detection due to the overwhelming size of the point cloud data. In this paper, we propose CenterFormer, a center-based transformer network for 3D object detection. CenterFormer first uses a center heatmap to select center candidates on top of a standard voxel-based point cloud encoder. It then uses the feature of the center candidate as the query embedding in the transformer. To further aggregate features from multiple frames, we design an approach to fuse features through cross-attention. Lastly, regression heads are added to predict the bounding box on the output center feature representation. Our design reduces the convergence difficulty and computational complexity of the transformer structure. The results show significant improvements over the strong baseline of anchor-free object detection networks. CenterFormer achieves state-of-the-art performance for a single model on the Waymo Open Dataset, with 73.7% mAPH on the validation set and 75.6% mAPH on the test set, significantly outperforming all previously published CNN and transformer-based methods. Our code is publicly available at https://github.com/TuSimple/centerformer\",\"PeriodicalId\":72676,\"journal\":{\"name\":\"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision\",\"volume\":\"139 1\",\"pages\":\"496-513\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"51\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2209.05588\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2209.05588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 51

摘要

在许多图像域任务中,基于查询的变换在构建远程注意力方面显示出巨大的潜力,但由于点云数据的压倒性规模,在基于lidar的三维目标检测中很少被考虑。在本文中,我们提出了CenterFormer,一个基于中心的变压器网络,用于三维目标检测。CenterFormer首先使用中心热图在标准的基于体素的点云编码器上选择中心候选者。然后使用中心候选的特征作为查询嵌入到转换器中。为了进一步聚合多帧的特征,我们设计了一种通过交叉关注来融合特征的方法。最后,加入回归头来预测输出中心特征表示上的边界框。我们的设计降低了变压器结构的收敛难度和计算复杂度。结果表明,与无锚点目标检测网络的强基线相比,该方法有显著的改进。CenterFormer在Waymo开放数据集上对单个模型实现了最先进的性能,在验证集上的mAPH为73.7%,在测试集上的mAPH为75.6%,显著优于之前发布的所有基于CNN和变压器的方法。我们的代码可以在https://github.com/TuSimple/centerformer上公开获得
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CenterFormer: Center-based Transformer for 3D Object Detection
Query-based transformer has shown great potential in constructing long-range attention in many image-domain tasks, but has rarely been considered in LiDAR-based 3D object detection due to the overwhelming size of the point cloud data. In this paper, we propose CenterFormer, a center-based transformer network for 3D object detection. CenterFormer first uses a center heatmap to select center candidates on top of a standard voxel-based point cloud encoder. It then uses the feature of the center candidate as the query embedding in the transformer. To further aggregate features from multiple frames, we design an approach to fuse features through cross-attention. Lastly, regression heads are added to predict the bounding box on the output center feature representation. Our design reduces the convergence difficulty and computational complexity of the transformer structure. The results show significant improvements over the strong baseline of anchor-free object detection networks. CenterFormer achieves state-of-the-art performance for a single model on the Waymo Open Dataset, with 73.7% mAPH on the validation set and 75.6% mAPH on the test set, significantly outperforming all previously published CNN and transformer-based methods. Our code is publicly available at https://github.com/TuSimple/centerformer
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信