Dongxu Lyu;Zhenyu Li;Yansong Xu;Gang Wang;Wenjie Li;Yuzhou Chen;Liyan Chen;Weifeng He;Guanghui He
{"title":"面向自动驾驶中以视觉为中心的3D感知的高效多视点交叉注意加速器","authors":"Dongxu Lyu;Zhenyu Li;Yansong Xu;Gang Wang;Wenjie Li;Yuzhou Chen;Liyan Chen;Weifeng He;Guanghui He","doi":"10.1109/TCSI.2025.3555837","DOIUrl":null,"url":null,"abstract":"Vision-centric 3D perception has become a key mechanism in autonomous driving. It achieves exceptional perceptual performance mainly by introducing a novel attention, multi-view cross-attention (MVCA), for learnable feature extraction and fusion from surround-view cameras. Despite its superiority, MVCA encounters severe inefficiencies in sample, processing elements (PE), and pipelined processing, owing to the redundant and non-uniform sampling-aggregation and rigorous inter-operator dependencies. To address these issues, this article proposes a dedicated MVCA accelerator, MVAtor, with algorithm-architecture co-optimization for vision-centric 3D perception based on multi-view inputs flexibly. For sample inefficiency, a 3-tier hybrid static-dynamic sample and a sensitivity-aware feature pruning approach are proposed to eliminate the 86.03% sample overhead and 24.48% memory requirement, only incuring <1%> <tex-math>$53.7\\sim 96.1$ </tex-math></inline-formula>% energy-delay product reduction. For pipeline inefficiency, a fine-grained-tiling assisted highly-pipelined architecture is constructed in MVAtor by exploiting the decoupling opportunities on inter-view sparsity, thereby saving 61.03% external memory access while boosting the overall throughputs by <inline-formula> <tex-math>$1.83\\times $ </tex-math></inline-formula>. Extensively evaluated on representative benchmarks, MVAtor attains <inline-formula> <tex-math>$1.38\\sim 7.67\\times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$1.67\\sim 11.15\\times $ </tex-math></inline-formula> improvement on energy and area efficiency respectively, compared to the state-of-the-art related accelerators.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 7","pages":"3272-3285"},"PeriodicalIF":5.2000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Efficient Multi-View Cross-Attention Accelerator for Vision-Centric 3D Perception in Autonomous Driving\",\"authors\":\"Dongxu Lyu;Zhenyu Li;Yansong Xu;Gang Wang;Wenjie Li;Yuzhou Chen;Liyan Chen;Weifeng He;Guanghui He\",\"doi\":\"10.1109/TCSI.2025.3555837\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Vision-centric 3D perception has become a key mechanism in autonomous driving. It achieves exceptional perceptual performance mainly by introducing a novel attention, multi-view cross-attention (MVCA), for learnable feature extraction and fusion from surround-view cameras. Despite its superiority, MVCA encounters severe inefficiencies in sample, processing elements (PE), and pipelined processing, owing to the redundant and non-uniform sampling-aggregation and rigorous inter-operator dependencies. To address these issues, this article proposes a dedicated MVCA accelerator, MVAtor, with algorithm-architecture co-optimization for vision-centric 3D perception based on multi-view inputs flexibly. For sample inefficiency, a 3-tier hybrid static-dynamic sample and a sensitivity-aware feature pruning approach are proposed to eliminate the 86.03% sample overhead and 24.48% memory requirement, only incuring <1%> <tex-math>$53.7\\\\sim 96.1$ </tex-math></inline-formula>% energy-delay product reduction. For pipeline inefficiency, a fine-grained-tiling assisted highly-pipelined architecture is constructed in MVAtor by exploiting the decoupling opportunities on inter-view sparsity, thereby saving 61.03% external memory access while boosting the overall throughputs by <inline-formula> <tex-math>$1.83\\\\times $ </tex-math></inline-formula>. Extensively evaluated on representative benchmarks, MVAtor attains <inline-formula> <tex-math>$1.38\\\\sim 7.67\\\\times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$1.67\\\\sim 11.15\\\\times $ </tex-math></inline-formula> improvement on energy and area efficiency respectively, compared to the state-of-the-art related accelerators.\",\"PeriodicalId\":13039,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"volume\":\"72 7\",\"pages\":\"3272-3285\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10950425/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems I: Regular Papers","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10950425/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
An Efficient Multi-View Cross-Attention Accelerator for Vision-Centric 3D Perception in Autonomous Driving
Vision-centric 3D perception has become a key mechanism in autonomous driving. It achieves exceptional perceptual performance mainly by introducing a novel attention, multi-view cross-attention (MVCA), for learnable feature extraction and fusion from surround-view cameras. Despite its superiority, MVCA encounters severe inefficiencies in sample, processing elements (PE), and pipelined processing, owing to the redundant and non-uniform sampling-aggregation and rigorous inter-operator dependencies. To address these issues, this article proposes a dedicated MVCA accelerator, MVAtor, with algorithm-architecture co-optimization for vision-centric 3D perception based on multi-view inputs flexibly. For sample inefficiency, a 3-tier hybrid static-dynamic sample and a sensitivity-aware feature pruning approach are proposed to eliminate the 86.03% sample overhead and 24.48% memory requirement, only incuring <1%> $53.7\sim 96.1$ % energy-delay product reduction. For pipeline inefficiency, a fine-grained-tiling assisted highly-pipelined architecture is constructed in MVAtor by exploiting the decoupling opportunities on inter-view sparsity, thereby saving 61.03% external memory access while boosting the overall throughputs by $1.83\times $ . Extensively evaluated on representative benchmarks, MVAtor attains $1.38\sim 7.67\times $ and $1.67\sim 11.15\times $ improvement on energy and area efficiency respectively, compared to the state-of-the-art related accelerators.
期刊介绍:
TCAS I publishes regular papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: - Circuits: Analog, Digital and Mixed Signal Circuits and Systems - Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic - Circuits and Systems, Power Electronics and Systems - Software for Analog-and-Logic Circuits and Systems - Control aspects of Circuits and Systems.