{"title":"基于28纳米CMOS交错管道和内存计算集群的片上训练关键字定位芯片","authors":"Junyi Qian;Cai Li;Long Chen;Ruidong Li;Tuo Li;Peng Cao;Xin Si;Weiwei Shan","doi":"10.1109/TVLSI.2025.3525740","DOIUrl":null,"url":null,"abstract":"To improve the precision of keyword spotting (KWS) for individual users on edge devices, we propose an on-chip-training KWS (OCT-KWS) chip for private data protection while also achieving ultralow -power inference. Our main contributions are: 1) identity interchange and interleaved pipeline methods during backpropagation (BP), enabling the pipelined execution of operations that traditionally had to be performed sequentially, reducing cache requirements for loss values by 95.8%; 2) all-digital isolated-bitline (BL)-based computation-in-memory (CIM) macro, eliminating ineffective computations caused by glitches, achieving <inline-formula> <tex-math>$2.03\\times $ </tex-math></inline-formula> higher energy efficiency; and 3) multisize CIM cluster-based BP data flow, designing each CIM macro collaboratively to achieve all-time full utilization, reducing 47.2% of output feature map (Ofmap) access. Fabricated in 28-nm CMOS and enhanced with a refined library characterization methodology, this chip achieves both the highest training energy efficiency of 101.5 TOPS/W and the lowest inference energy of 9.9nJ/decision among current KWS chips. By retraining a three-class depthwise-separable convolutional neural network (DSCNN), detection accuracy on the private dataset increases from 80.8% to 98.9%.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"1497-1501"},"PeriodicalIF":2.8000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An On-Chip-Training Keyword-Spotting Chip Using Interleaved Pipeline and Computation-in-Memory Cluster in 28-nm CMOS\",\"authors\":\"Junyi Qian;Cai Li;Long Chen;Ruidong Li;Tuo Li;Peng Cao;Xin Si;Weiwei Shan\",\"doi\":\"10.1109/TVLSI.2025.3525740\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To improve the precision of keyword spotting (KWS) for individual users on edge devices, we propose an on-chip-training KWS (OCT-KWS) chip for private data protection while also achieving ultralow -power inference. Our main contributions are: 1) identity interchange and interleaved pipeline methods during backpropagation (BP), enabling the pipelined execution of operations that traditionally had to be performed sequentially, reducing cache requirements for loss values by 95.8%; 2) all-digital isolated-bitline (BL)-based computation-in-memory (CIM) macro, eliminating ineffective computations caused by glitches, achieving <inline-formula> <tex-math>$2.03\\\\times $ </tex-math></inline-formula> higher energy efficiency; and 3) multisize CIM cluster-based BP data flow, designing each CIM macro collaboratively to achieve all-time full utilization, reducing 47.2% of output feature map (Ofmap) access. Fabricated in 28-nm CMOS and enhanced with a refined library characterization methodology, this chip achieves both the highest training energy efficiency of 101.5 TOPS/W and the lowest inference energy of 9.9nJ/decision among current KWS chips. By retraining a three-class depthwise-separable convolutional neural network (DSCNN), detection accuracy on the private dataset increases from 80.8% to 98.9%.\",\"PeriodicalId\":13425,\"journal\":{\"name\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"volume\":\"33 5\",\"pages\":\"1497-1501\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10838343/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10838343/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
An On-Chip-Training Keyword-Spotting Chip Using Interleaved Pipeline and Computation-in-Memory Cluster in 28-nm CMOS
To improve the precision of keyword spotting (KWS) for individual users on edge devices, we propose an on-chip-training KWS (OCT-KWS) chip for private data protection while also achieving ultralow -power inference. Our main contributions are: 1) identity interchange and interleaved pipeline methods during backpropagation (BP), enabling the pipelined execution of operations that traditionally had to be performed sequentially, reducing cache requirements for loss values by 95.8%; 2) all-digital isolated-bitline (BL)-based computation-in-memory (CIM) macro, eliminating ineffective computations caused by glitches, achieving $2.03\times $ higher energy efficiency; and 3) multisize CIM cluster-based BP data flow, designing each CIM macro collaboratively to achieve all-time full utilization, reducing 47.2% of output feature map (Ofmap) access. Fabricated in 28-nm CMOS and enhanced with a refined library characterization methodology, this chip achieves both the highest training energy efficiency of 101.5 TOPS/W and the lowest inference energy of 9.9nJ/decision among current KWS chips. By retraining a three-class depthwise-separable convolutional neural network (DSCNN), detection accuracy on the private dataset increases from 80.8% to 98.9%.
期刊介绍:
The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.