Yuriy Mishchenko, Yusuf Goren, Ming Sun, Chris Beauchene, Spyros Matsoukas, Oleg Rybakov, S. Vitaladevuni
{"title":"Low-Bit Quantization and Quantization-Aware Training for Small-Footprint Keyword Spotting","authors":"Yuriy Mishchenko, Yusuf Goren, Ming Sun, Chris Beauchene, Spyros Matsoukas, Oleg Rybakov, S. Vitaladevuni","doi":"10.1109/ICMLA.2019.00127","DOIUrl":null,"url":null,"abstract":"In this paper, we investigate novel quantization approaches to reduce memory and computational footprint of deep neural network (DNN) based keyword spotters (KWS). We propose a new method for KWS offline and online quantization, which we call dynamic quantization, where we quantize DNN weight matrices column-wise, using each column's exact individual min-max range, and the DNN layers' inputs and outputs are quantized for every input audio frame individually, using the exact min-max range of each input and output vector. We further apply a new quantization-aware training approach that allows us to incorporate quantization errors into KWS model during training. Together, these approaches allow us to significantly improve the performance of KWS in 4-bit and 8-bit quantized precision, achieving the end-to-end accuracy close to that of full precision models while reducing the models' on-device memory footprint by up to 80%.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2019.00127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19
Abstract
In this paper, we investigate novel quantization approaches to reduce memory and computational footprint of deep neural network (DNN) based keyword spotters (KWS). We propose a new method for KWS offline and online quantization, which we call dynamic quantization, where we quantize DNN weight matrices column-wise, using each column's exact individual min-max range, and the DNN layers' inputs and outputs are quantized for every input audio frame individually, using the exact min-max range of each input and output vector. We further apply a new quantization-aware training approach that allows us to incorporate quantization errors into KWS model during training. Together, these approaches allow us to significantly improve the performance of KWS in 4-bit and 8-bit quantized precision, achieving the end-to-end accuracy close to that of full precision models while reducing the models' on-device memory footprint by up to 80%.