14.1 A 510nW 0.41V Low-Memory Low-Computation Keyword-Spotting Chip Using Serial FFT-Based MFCC and Binarized Depthwise Separable Convolutional Neural Network in 28nm CMOS
Weiwei Shan, Minhao Yang, Jiaming Xu, Yicheng Lu, Shuai Zhang, Tao Wang, Jun Yang, Longxing Shi, Mingoo Seok
{"title":"14.1 A 510nW 0.41V Low-Memory Low-Computation Keyword-Spotting Chip Using Serial FFT-Based MFCC and Binarized Depthwise Separable Convolutional Neural Network in 28nm CMOS","authors":"Weiwei Shan, Minhao Yang, Jiaming Xu, Yicheng Lu, Shuai Zhang, Tao Wang, Jun Yang, Longxing Shi, Mingoo Seok","doi":"10.1109/ISSCC19947.2020.9063000","DOIUrl":null,"url":null,"abstract":"Ultra-low power is a strong requirement for always-on speech interfaces in wearable and mobile devices, such as Voice Activity Detection (VAD) and Keyword Spotting (KWS) [1]–[5]. A KWS system is used to detect specific wake-up words by speakers and has to be always on. Previous ASICs for KWS lack energy-efficient implementations having power $< 5\\mu \\mathrm{W}$. For example, deep neural network (DNN)-based KWS [1] has a large on-chip weight memory of 270KB and consumes $288\\mu \\mathrm{W}$. A binarized convolutional neural network (CNN) used 52KB of SRAM, $141\\mu \\mathrm{W}$ wakeup power at 2.5MHz, 0.57V [2]. An LSTM-based SoC used 105KB of SRAM and reduced power to $16.11\\mu\\mathrm{W}$ for KWS with 90.8% accuracy on the Google Speech Command Dataset (GSCD) [3]. Laika reduced power to $5\\mu \\mathrm{W}$ [4], not including the Mel Frequency Cepstrum Coefficient (MFCC) circuit. High compute and memory requirements have prevented always-on KWS chips from operating in the $\\mathrm{sub}-\\mu \\mathrm{W}$ range.","PeriodicalId":178871,"journal":{"name":"2020 IEEE International Solid- State Circuits Conference - (ISSCC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Solid- State Circuits Conference - (ISSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSCC19947.2020.9063000","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30
Abstract
Ultra-low power is a strong requirement for always-on speech interfaces in wearable and mobile devices, such as Voice Activity Detection (VAD) and Keyword Spotting (KWS) [1]–[5]. A KWS system is used to detect specific wake-up words by speakers and has to be always on. Previous ASICs for KWS lack energy-efficient implementations having power $< 5\mu \mathrm{W}$. For example, deep neural network (DNN)-based KWS [1] has a large on-chip weight memory of 270KB and consumes $288\mu \mathrm{W}$. A binarized convolutional neural network (CNN) used 52KB of SRAM, $141\mu \mathrm{W}$ wakeup power at 2.5MHz, 0.57V [2]. An LSTM-based SoC used 105KB of SRAM and reduced power to $16.11\mu\mathrm{W}$ for KWS with 90.8% accuracy on the Google Speech Command Dataset (GSCD) [3]. Laika reduced power to $5\mu \mathrm{W}$ [4], not including the Mel Frequency Cepstrum Coefficient (MFCC) circuit. High compute and memory requirements have prevented always-on KWS chips from operating in the $\mathrm{sub}-\mu \mathrm{W}$ range.