小尺寸关键字识别的低比特量化和量化感知训练

2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA) Pub Date : 2019-12-01 DOI:10.1109/ICMLA.2019.00127

Yuriy Mishchenko, Yusuf Goren, Ming Sun, Chris Beauchene, Spyros Matsoukas, Oleg Rybakov, S. Vitaladevuni

{"title":"小尺寸关键字识别的低比特量化和量化感知训练","authors":"Yuriy Mishchenko, Yusuf Goren, Ming Sun, Chris Beauchene, Spyros Matsoukas, Oleg Rybakov, S. Vitaladevuni","doi":"10.1109/ICMLA.2019.00127","DOIUrl":null,"url":null,"abstract":"In this paper, we investigate novel quantization approaches to reduce memory and computational footprint of deep neural network (DNN) based keyword spotters (KWS). We propose a new method for KWS offline and online quantization, which we call dynamic quantization, where we quantize DNN weight matrices column-wise, using each column's exact individual min-max range, and the DNN layers' inputs and outputs are quantized for every input audio frame individually, using the exact min-max range of each input and output vector. We further apply a new quantization-aware training approach that allows us to incorporate quantization errors into KWS model during training. Together, these approaches allow us to significantly improve the performance of KWS in 4-bit and 8-bit quantized precision, achieving the end-to-end accuracy close to that of full precision models while reducing the models' on-device memory footprint by up to 80%.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Low-Bit Quantization and Quantization-Aware Training for Small-Footprint Keyword Spotting\",\"authors\":\"Yuriy Mishchenko, Yusuf Goren, Ming Sun, Chris Beauchene, Spyros Matsoukas, Oleg Rybakov, S. Vitaladevuni\",\"doi\":\"10.1109/ICMLA.2019.00127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we investigate novel quantization approaches to reduce memory and computational footprint of deep neural network (DNN) based keyword spotters (KWS). We propose a new method for KWS offline and online quantization, which we call dynamic quantization, where we quantize DNN weight matrices column-wise, using each column's exact individual min-max range, and the DNN layers' inputs and outputs are quantized for every input audio frame individually, using the exact min-max range of each input and output vector. We further apply a new quantization-aware training approach that allows us to incorporate quantization errors into KWS model during training. Together, these approaches allow us to significantly improve the performance of KWS in 4-bit and 8-bit quantized precision, achieving the end-to-end accuracy close to that of full precision models while reducing the models' on-device memory footprint by up to 80%.\",\"PeriodicalId\":436714,\"journal\":{\"name\":\"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2019.00127\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2019.00127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

在本文中，我们研究了新的量化方法来减少基于深度神经网络(DNN)的关键词定位器(KWS)的内存和计算足迹。我们提出了一种KWS离线和在线量化的新方法，我们称之为动态量化，其中我们按列量化DNN权重矩阵，使用每个列的精确单个最小-最大范围，并且DNN层的输入和输出分别量化每个输入音频帧，使用每个输入和输出向量的精确最小-最大范围。我们进一步应用了一种新的量化感知训练方法，该方法允许我们在训练期间将量化误差纳入KWS模型。总之，这些方法使我们能够显着提高KWS在4位和8位量化精度方面的性能，实现接近全精度模型的端到端精度，同时将模型的设备上内存占用减少高达80%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Low-Bit Quantization and Quantization-Aware Training for Small-Footprint Keyword Spotting

In this paper, we investigate novel quantization approaches to reduce memory and computational footprint of deep neural network (DNN) based keyword spotters (KWS). We propose a new method for KWS offline and online quantization, which we call dynamic quantization, where we quantize DNN weight matrices column-wise, using each column's exact individual min-max range, and the DNN layers' inputs and outputs are quantized for every input audio frame individually, using the exact min-max range of each input and output vector. We further apply a new quantization-aware training approach that allows us to incorporate quantization errors into KWS model during training. Together, these approaches allow us to significantly improve the performance of KWS in 4-bit and 8-bit quantized precision, achieving the end-to-end accuracy close to that of full precision models while reducing the models' on-device memory footprint by up to 80%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)

自引率

0.00%

发文量