LaRa:多相机鸟瞰语义分割的潜势和光线

Florent Bartoccioni, 'Eloi Zablocki, Andrei Bursuc, Patrick P'erez, M. Cord, Alahari Karteek
{"title":"LaRa:多相机鸟瞰语义分割的潜势和光线","authors":"Florent Bartoccioni, 'Eloi Zablocki, Andrei Bursuc, Patrick P'erez, M. Cord, Alahari Karteek","doi":"10.48550/arXiv.2206.13294","DOIUrl":null,"url":null,"abstract":"Recent works in autonomous driving have widely adopted the bird's-eye-view (BEV) semantic map as an intermediate representation of the world. Online prediction of these BEV maps involves non-trivial operations such as multi-camera data extraction as well as fusion and projection into a common topview grid. This is usually done with error-prone geometric operations (e.g., homography or back-projection from monocular depth estimation) or expensive direct dense mapping between image pixels and pixels in BEV (e.g., with MLP or attention). In this work, we present 'LaRa', an efficient encoder-decoder, transformer-based model for vehicle semantic segmentation from multiple cameras. Our approach uses a system of cross-attention to aggregate information over multiple sensors into a compact, yet rich, collection of latent representations. These latent representations, after being processed by a series of self-attention blocks, are then reprojected with a second cross-attention in the BEV space. We demonstrate that our model outperforms the best previous works using transformers on nuScenes. The code and trained models are available at https://github.com/valeoai/LaRa","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation\",\"authors\":\"Florent Bartoccioni, 'Eloi Zablocki, Andrei Bursuc, Patrick P'erez, M. Cord, Alahari Karteek\",\"doi\":\"10.48550/arXiv.2206.13294\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent works in autonomous driving have widely adopted the bird's-eye-view (BEV) semantic map as an intermediate representation of the world. Online prediction of these BEV maps involves non-trivial operations such as multi-camera data extraction as well as fusion and projection into a common topview grid. This is usually done with error-prone geometric operations (e.g., homography or back-projection from monocular depth estimation) or expensive direct dense mapping between image pixels and pixels in BEV (e.g., with MLP or attention). In this work, we present 'LaRa', an efficient encoder-decoder, transformer-based model for vehicle semantic segmentation from multiple cameras. Our approach uses a system of cross-attention to aggregate information over multiple sensors into a compact, yet rich, collection of latent representations. These latent representations, after being processed by a series of self-attention blocks, are then reprojected with a second cross-attention in the BEV space. We demonstrate that our model outperforms the best previous works using transformers on nuScenes. The code and trained models are available at https://github.com/valeoai/LaRa\",\"PeriodicalId\":273870,\"journal\":{\"name\":\"Conference on Robot Learning\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Conference on Robot Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2206.13294\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Robot Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2206.13294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

近年来,自动驾驶领域广泛采用鸟瞰图(BEV)语义图作为对世界的中间表示。这些BEV地图的在线预测涉及一些重要的操作,如多相机数据提取以及融合和投影到一个共同的俯视图网格中。这通常是通过容易出错的几何操作(例如,单目深度估计的同形或反向投影)或BEV中图像像素和像素之间昂贵的直接密集映射(例如,使用MLP或注意力)来完成的。在这项工作中,我们提出了“LaRa”,一种高效的编码器-解码器,基于变压器的模型,用于从多个摄像头中进行车辆语义分割。我们的方法使用交叉注意系统将多个传感器上的信息聚合成一个紧凑而丰富的潜在表征集合。这些潜在表征在经过一系列自我注意块的处理后,然后在BEV空间中用第二次交叉注意重新投射。我们证明了我们的模型优于以前在nuScenes上使用变压器的最佳作品。代码和经过训练的模型可在https://github.com/valeoai/LaRa上获得
本文章由计算机程序翻译,如有差异,请以英文原文为准。
LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation
Recent works in autonomous driving have widely adopted the bird's-eye-view (BEV) semantic map as an intermediate representation of the world. Online prediction of these BEV maps involves non-trivial operations such as multi-camera data extraction as well as fusion and projection into a common topview grid. This is usually done with error-prone geometric operations (e.g., homography or back-projection from monocular depth estimation) or expensive direct dense mapping between image pixels and pixels in BEV (e.g., with MLP or attention). In this work, we present 'LaRa', an efficient encoder-decoder, transformer-based model for vehicle semantic segmentation from multiple cameras. Our approach uses a system of cross-attention to aggregate information over multiple sensors into a compact, yet rich, collection of latent representations. These latent representations, after being processed by a series of self-attention blocks, are then reprojected with a second cross-attention in the BEV space. We demonstrate that our model outperforms the best previous works using transformers on nuScenes. The code and trained models are available at https://github.com/valeoai/LaRa
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信