{"title":"A 36 mJ/Inf Convolution Accelerator With Reduced Memory Access and Regrouped Sparse Kernels for Environment Sound Classification on Edge Devices","authors":"Lichen Feng;Tao Wang;Rundong Cai;Feng Min;Zhangming Zhu","doi":"10.1109/TCSII.2025.3585516","DOIUrl":null,"url":null,"abstract":"Efficient environment sound classification (ESC) on edge devices is valuable for applications requiring continuous, long-term monitoring. Existing ESC processors have demonstrated great reductions in latency and resource occupation. However, model sparsity and computation flow still require further optimization. In this brief, we propose an end-to-end ultra-lightweight Depthwise Separable Convolution (DSC) neural network, E2E-ULDSC-Pruned, which is made publicly available as an open-source release. To implement this model, a customized accelerator featuring pipelined DSC computation and regrouped sparse kernels is developed, achieving 36mJ/Inference in ZCU102 FPGA (254ms latency and 143mW power consumption), which is superior to recent works.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 9","pages":"1258-1262"},"PeriodicalIF":4.9000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems II: Express Briefs","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11068177/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Efficient environment sound classification (ESC) on edge devices is valuable for applications requiring continuous, long-term monitoring. Existing ESC processors have demonstrated great reductions in latency and resource occupation. However, model sparsity and computation flow still require further optimization. In this brief, we propose an end-to-end ultra-lightweight Depthwise Separable Convolution (DSC) neural network, E2E-ULDSC-Pruned, which is made publicly available as an open-source release. To implement this model, a customized accelerator featuring pipelined DSC computation and regrouped sparse kernels is developed, achieving 36mJ/Inference in ZCU102 FPGA (254ms latency and 143mW power consumption), which is superior to recent works.
期刊介绍:
TCAS II publishes brief papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes:
Circuits: Analog, Digital and Mixed Signal Circuits and Systems
Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic
Circuits and Systems, Power Electronics and Systems
Software for Analog-and-Logic Circuits and Systems
Control aspects of Circuits and Systems.