An Efficient Multi-View Cross-Attention Accelerator for Vision-Centric 3D Perception in Autonomous Driving

IF 5.2 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems I: Regular Papers Pub Date : 2025-04-07 DOI:10.1109/TCSI.2025.3555837

Dongxu Lyu;Zhenyu Li;Yansong Xu;Gang Wang;Wenjie Li;Yuzhou Chen;Liyan Chen;Weifeng He;Guanghui He

{"title":"An Efficient Multi-View Cross-Attention Accelerator for Vision-Centric 3D Perception in Autonomous Driving","authors":"Dongxu Lyu;Zhenyu Li;Yansong Xu;Gang Wang;Wenjie Li;Yuzhou Chen;Liyan Chen;Weifeng He;Guanghui He","doi":"10.1109/TCSI.2025.3555837","DOIUrl":null,"url":null,"abstract":"Vision-centric 3D perception has become a key mechanism in autonomous driving. It achieves exceptional perceptual performance mainly by introducing a novel attention, multi-view cross-attention (MVCA), for learnable feature extraction and fusion from surround-view cameras. Despite its superiority, MVCA encounters severe inefficiencies in sample, processing elements (PE), and pipelined processing, owing to the redundant and non-uniform sampling-aggregation and rigorous inter-operator dependencies. To address these issues, this article proposes a dedicated MVCA accelerator, MVAtor, with algorithm-architecture co-optimization for vision-centric 3D perception based on multi-view inputs flexibly. For sample inefficiency, a 3-tier hybrid static-dynamic sample and a sensitivity-aware feature pruning approach are proposed to eliminate the 86.03% sample overhead and 24.48% memory requirement, only incuring <1%> <tex-math>$53.7\\sim 96.1$ </tex-math></inline-formula>% energy-delay product reduction. For pipeline inefficiency, a fine-grained-tiling assisted highly-pipelined architecture is constructed in MVAtor by exploiting the decoupling opportunities on inter-view sparsity, thereby saving 61.03% external memory access while boosting the overall throughputs by <inline-formula> <tex-math>$1.83\\times $ </tex-math></inline-formula>. Extensively evaluated on representative benchmarks, MVAtor attains <inline-formula> <tex-math>$1.38\\sim 7.67\\times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$1.67\\sim 11.15\\times $ </tex-math></inline-formula> improvement on energy and area efficiency respectively, compared to the state-of-the-art related accelerators.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 7","pages":"3272-3285"},"PeriodicalIF":5.2000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems I: Regular Papers","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10950425/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Vision-centric 3D perception has become a key mechanism in autonomous driving. It achieves exceptional perceptual performance mainly by introducing a novel attention, multi-view cross-attention (MVCA), for learnable feature extraction and fusion from surround-view cameras. Despite its superiority, MVCA encounters severe inefficiencies in sample, processing elements (PE), and pipelined processing, owing to the redundant and non-uniform sampling-aggregation and rigorous inter-operator dependencies. To address these issues, this article proposes a dedicated MVCA accelerator, MVAtor, with algorithm-architecture co-optimization for vision-centric 3D perception based on multi-view inputs flexibly. For sample inefficiency, a 3-tier hybrid static-dynamic sample and a sensitivity-aware feature pruning approach are proposed to eliminate the 86.03% sample overhead and 24.48% memory requirement, only incuring <1%> $53.7\sim 96.1$ % energy-delay product reduction. For pipeline inefficiency, a fine-grained-tiling assisted highly-pipelined architecture is constructed in MVAtor by exploiting the decoupling opportunities on inter-view sparsity, thereby saving 61.03% external memory access while boosting the overall throughputs by

$1.83\times $

. Extensively evaluated on representative benchmarks, MVAtor attains

$1.38\sim 7.67\times $

and

$1.67\sim 11.15\times $

improvement on energy and area efficiency respectively, compared to the state-of-the-art related accelerators.

查看原文本刊更多论文

面向自动驾驶中以视觉为中心的3D感知的高效多视点交叉注意加速器

以视觉为中心的三维感知已经成为自动驾驶的关键机制。它主要通过引入一种新颖的多视图交叉注意（multi-view cross-attention， MVCA）来实现卓越的感知性能，用于从环视相机中提取可学习的特征和融合。尽管具有优势，但由于冗余和不均匀的采样聚合以及严格的算子间依赖关系，MVCA在样本、处理元素（PE）和流水线处理方面存在严重的效率低下问题。为了解决这些问题，本文提出了一个专用的MVCA加速器MVAtor，该加速器具有算法架构协同优化，可灵活地基于多视图输入实现以视觉为中心的3D感知。对于样本效率低下，提出了一种3层混合静态动态样本和灵敏度感知特征剪枝方法，消除了86.03%的样本开销和24.48%的内存需求，仅减少了53.7美元/ 96.1美元的能量延迟产品。对于流水线效率低下的问题，MVAtor通过利用视图间稀疏性的解耦机会，构建了细粒度平片辅助的高度流水线架构，从而节省了61.03%的外部内存访问，同时将总体吞吐量提高了1.83倍。根据代表性基准进行的广泛评估，与最先进的相关加速器相比，MVAtor在能量和面积效率方面分别提高了1.38美元和1.67美元，分别提高了7.67美元和11.15美元。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Circuits and Systems I: Regular Papers 工程技术-工程：电子与电气

CiteScore

9.80

自引率

11.80%

发文量

441

审稿时长

2 months

期刊介绍： TCAS I publishes regular papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: - Circuits: Analog, Digital and Mixed Signal Circuits and Systems - Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic - Circuits and Systems, Power Electronics and Systems - Software for Analog-and-Logic Circuits and Systems - Control aspects of Circuits and Systems.