基于28纳米CMOS交错管道和内存计算集群的片上训练关键字定位芯片

IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Junyi Qian;Cai Li;Long Chen;Ruidong Li;Tuo Li;Peng Cao;Xin Si;Weiwei Shan
{"title":"基于28纳米CMOS交错管道和内存计算集群的片上训练关键字定位芯片","authors":"Junyi Qian;Cai Li;Long Chen;Ruidong Li;Tuo Li;Peng Cao;Xin Si;Weiwei Shan","doi":"10.1109/TVLSI.2025.3525740","DOIUrl":null,"url":null,"abstract":"To improve the precision of keyword spotting (KWS) for individual users on edge devices, we propose an on-chip-training KWS (OCT-KWS) chip for private data protection while also achieving ultralow -power inference. Our main contributions are: 1) identity interchange and interleaved pipeline methods during backpropagation (BP), enabling the pipelined execution of operations that traditionally had to be performed sequentially, reducing cache requirements for loss values by 95.8%; 2) all-digital isolated-bitline (BL)-based computation-in-memory (CIM) macro, eliminating ineffective computations caused by glitches, achieving <inline-formula> <tex-math>$2.03\\times $ </tex-math></inline-formula> higher energy efficiency; and 3) multisize CIM cluster-based BP data flow, designing each CIM macro collaboratively to achieve all-time full utilization, reducing 47.2% of output feature map (Ofmap) access. Fabricated in 28-nm CMOS and enhanced with a refined library characterization methodology, this chip achieves both the highest training energy efficiency of 101.5 TOPS/W and the lowest inference energy of 9.9nJ/decision among current KWS chips. By retraining a three-class depthwise-separable convolutional neural network (DSCNN), detection accuracy on the private dataset increases from 80.8% to 98.9%.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"1497-1501"},"PeriodicalIF":2.8000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An On-Chip-Training Keyword-Spotting Chip Using Interleaved Pipeline and Computation-in-Memory Cluster in 28-nm CMOS\",\"authors\":\"Junyi Qian;Cai Li;Long Chen;Ruidong Li;Tuo Li;Peng Cao;Xin Si;Weiwei Shan\",\"doi\":\"10.1109/TVLSI.2025.3525740\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To improve the precision of keyword spotting (KWS) for individual users on edge devices, we propose an on-chip-training KWS (OCT-KWS) chip for private data protection while also achieving ultralow -power inference. Our main contributions are: 1) identity interchange and interleaved pipeline methods during backpropagation (BP), enabling the pipelined execution of operations that traditionally had to be performed sequentially, reducing cache requirements for loss values by 95.8%; 2) all-digital isolated-bitline (BL)-based computation-in-memory (CIM) macro, eliminating ineffective computations caused by glitches, achieving <inline-formula> <tex-math>$2.03\\\\times $ </tex-math></inline-formula> higher energy efficiency; and 3) multisize CIM cluster-based BP data flow, designing each CIM macro collaboratively to achieve all-time full utilization, reducing 47.2% of output feature map (Ofmap) access. Fabricated in 28-nm CMOS and enhanced with a refined library characterization methodology, this chip achieves both the highest training energy efficiency of 101.5 TOPS/W and the lowest inference energy of 9.9nJ/decision among current KWS chips. By retraining a three-class depthwise-separable convolutional neural network (DSCNN), detection accuracy on the private dataset increases from 80.8% to 98.9%.\",\"PeriodicalId\":13425,\"journal\":{\"name\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"volume\":\"33 5\",\"pages\":\"1497-1501\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10838343/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10838343/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

为了提高边缘设备上个人用户关键字识别(KWS)的精度,我们提出了一种片上训练KWS (OCT-KWS)芯片,用于私人数据保护,同时还实现了超低功耗推断。我们的主要贡献是:1)反向传播(BP)期间的身份交换和交错管道方法,使传统上必须顺序执行的操作能够流水线执行,将丢失值的缓存需求降低了95.8%;2)基于全数字隔离位线(BL)的内存计算(CIM)宏,消除了因故障导致的无效计算,实现了2.03倍的能源效率提升;3)基于多尺度CIM集群的BP数据流,协同设计各CIM宏,实现全时全利用率,减少输出特征图(Ofmap)访问47.2%。在现有的KWS芯片中,该芯片的最高训练能量效率为101.5 TOPS/W,最低推理能量为9.9nJ/decision。通过对三级深度可分离卷积神经网络(DSCNN)进行再训练,在私有数据集上的检测准确率从80.8%提高到98.9%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An On-Chip-Training Keyword-Spotting Chip Using Interleaved Pipeline and Computation-in-Memory Cluster in 28-nm CMOS
To improve the precision of keyword spotting (KWS) for individual users on edge devices, we propose an on-chip-training KWS (OCT-KWS) chip for private data protection while also achieving ultralow -power inference. Our main contributions are: 1) identity interchange and interleaved pipeline methods during backpropagation (BP), enabling the pipelined execution of operations that traditionally had to be performed sequentially, reducing cache requirements for loss values by 95.8%; 2) all-digital isolated-bitline (BL)-based computation-in-memory (CIM) macro, eliminating ineffective computations caused by glitches, achieving $2.03\times $ higher energy efficiency; and 3) multisize CIM cluster-based BP data flow, designing each CIM macro collaboratively to achieve all-time full utilization, reducing 47.2% of output feature map (Ofmap) access. Fabricated in 28-nm CMOS and enhanced with a refined library characterization methodology, this chip achieves both the highest training energy efficiency of 101.5 TOPS/W and the lowest inference energy of 9.9nJ/decision among current KWS chips. By retraining a three-class depthwise-separable convolutional neural network (DSCNN), detection accuracy on the private dataset increases from 80.8% to 98.9%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.40
自引率
7.10%
发文量
187
审稿时长
3.6 months
期刊介绍: The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society. Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels. To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信