Low-Bit Quantization and Quantization-Aware Training for Small-Footprint Keyword Spotting

2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA) Pub Date : 2019-12-01 DOI:10.1109/ICMLA.2019.00127

Yuriy Mishchenko, Yusuf Goren, Ming Sun, Chris Beauchene, Spyros Matsoukas, Oleg Rybakov, S. Vitaladevuni

引用次数: 19

Abstract

In this paper, we investigate novel quantization approaches to reduce memory and computational footprint of deep neural network (DNN) based keyword spotters (KWS). We propose a new method for KWS offline and online quantization, which we call dynamic quantization, where we quantize DNN weight matrices column-wise, using each column's exact individual min-max range, and the DNN layers' inputs and outputs are quantized for every input audio frame individually, using the exact min-max range of each input and output vector. We further apply a new quantization-aware training approach that allows us to incorporate quantization errors into KWS model during training. Together, these approaches allow us to significantly improve the performance of KWS in 4-bit and 8-bit quantized precision, achieving the end-to-end accuracy close to that of full precision models while reducing the models' on-device memory footprint by up to 80%.

查看原文本刊更多论文

小尺寸关键字识别的低比特量化和量化感知训练

在本文中，我们研究了新的量化方法来减少基于深度神经网络(DNN)的关键词定位器(KWS)的内存和计算足迹。我们提出了一种KWS离线和在线量化的新方法，我们称之为动态量化，其中我们按列量化DNN权重矩阵，使用每个列的精确单个最小-最大范围，并且DNN层的输入和输出分别量化每个输入音频帧，使用每个输入和输出向量的精确最小-最大范围。我们进一步应用了一种新的量化感知训练方法，该方法允许我们在训练期间将量化误差纳入KWS模型。总之，这些方法使我们能够显着提高KWS在4位和8位量化精度方面的性能，实现接近全精度模型的端到端精度，同时将模型的设备上内存占用减少高达80%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)

自引率

0.00%

发文量