Jingzhong Li, Lin Yang, Zhen Shi, Yuxuan Chen, Yue Jin, Kanta Akiyama, Anze Xu
{"title":"SparseDet:通过稀疏场景表示实现高效的多视角三维物体检测","authors":"Jingzhong Li, Lin Yang, Zhen Shi, Yuxuan Chen, Yue Jin, Kanta Akiyama, Anze Xu","doi":"10.1016/j.aei.2024.102955","DOIUrl":null,"url":null,"abstract":"<div><div>Efficient and reliable 3D object detection via multi-view cameras is pivotal for improving the safety and facilitating the cost-effective deployment of autonomous driving systems. However, owing to the learning of <em>dense</em> scene representations, existing methods still suffer from high computational costs and excessive noise, affecting the efficiency and accuracy of the inference process. To overcome this challenge, we propose SparseDet, a model that exploits <em>sparse</em> scene representations. Specifically, a sparse sampling module with category-aware and geometry-aware supervision is first introduced to adaptively sample foreground features at both semantic and instance levels. Additionally, to conserve computational resources while retaining context information, we propose a background aggregation module designed to compress extensive background features into a compact set. These strategies can markedly diminish feature volume while preserving essential information to boost computational efficiency without compromising accuracy. Due to the efficient sparse scene representation, our SparseDet achieves leading performance on the widely used nuScenes benchmark. Comprehensive experiments validate that SparseDet surpasses the PETR while reducing the decoder computational complexity by 47% in terms of FLOPs, facilitating a leading inference speed of 35.6 FPS on a single RTX3090 GPU.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"62 ","pages":"Article 102955"},"PeriodicalIF":8.0000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SparseDet: Towards efficient multi-view 3D object detection via sparse scene representation\",\"authors\":\"Jingzhong Li, Lin Yang, Zhen Shi, Yuxuan Chen, Yue Jin, Kanta Akiyama, Anze Xu\",\"doi\":\"10.1016/j.aei.2024.102955\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Efficient and reliable 3D object detection via multi-view cameras is pivotal for improving the safety and facilitating the cost-effective deployment of autonomous driving systems. However, owing to the learning of <em>dense</em> scene representations, existing methods still suffer from high computational costs and excessive noise, affecting the efficiency and accuracy of the inference process. To overcome this challenge, we propose SparseDet, a model that exploits <em>sparse</em> scene representations. Specifically, a sparse sampling module with category-aware and geometry-aware supervision is first introduced to adaptively sample foreground features at both semantic and instance levels. Additionally, to conserve computational resources while retaining context information, we propose a background aggregation module designed to compress extensive background features into a compact set. These strategies can markedly diminish feature volume while preserving essential information to boost computational efficiency without compromising accuracy. Due to the efficient sparse scene representation, our SparseDet achieves leading performance on the widely used nuScenes benchmark. Comprehensive experiments validate that SparseDet surpasses the PETR while reducing the decoder computational complexity by 47% in terms of FLOPs, facilitating a leading inference speed of 35.6 FPS on a single RTX3090 GPU.</div></div>\",\"PeriodicalId\":50941,\"journal\":{\"name\":\"Advanced Engineering Informatics\",\"volume\":\"62 \",\"pages\":\"Article 102955\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advanced Engineering Informatics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1474034624006062\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Engineering Informatics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474034624006062","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
SparseDet: Towards efficient multi-view 3D object detection via sparse scene representation
Efficient and reliable 3D object detection via multi-view cameras is pivotal for improving the safety and facilitating the cost-effective deployment of autonomous driving systems. However, owing to the learning of dense scene representations, existing methods still suffer from high computational costs and excessive noise, affecting the efficiency and accuracy of the inference process. To overcome this challenge, we propose SparseDet, a model that exploits sparse scene representations. Specifically, a sparse sampling module with category-aware and geometry-aware supervision is first introduced to adaptively sample foreground features at both semantic and instance levels. Additionally, to conserve computational resources while retaining context information, we propose a background aggregation module designed to compress extensive background features into a compact set. These strategies can markedly diminish feature volume while preserving essential information to boost computational efficiency without compromising accuracy. Due to the efficient sparse scene representation, our SparseDet achieves leading performance on the widely used nuScenes benchmark. Comprehensive experiments validate that SparseDet surpasses the PETR while reducing the decoder computational complexity by 47% in terms of FLOPs, facilitating a leading inference speed of 35.6 FPS on a single RTX3090 GPU.
期刊介绍:
Advanced Engineering Informatics is an international Journal that solicits research papers with an emphasis on 'knowledge' and 'engineering applications'. The Journal seeks original papers that report progress in applying methods of engineering informatics. These papers should have engineering relevance and help provide a scientific base for more reliable, spontaneous, and creative engineering decision-making. Additionally, papers should demonstrate the science of supporting knowledge-intensive engineering tasks and validate the generality, power, and scalability of new methods through rigorous evaluation, preferably both qualitatively and quantitatively. Abstracting and indexing for Advanced Engineering Informatics include Science Citation Index Expanded, Scopus and INSPEC.