内存关键字识别模型的知识精馏

Interspeech Pub Date : 2022-09-18 DOI:10.21437/interspeech.2022-633

Zeyang Song, Qi Liu, Qu Yang, Haizhou Li

{"title":"内存关键字识别模型的知识精馏","authors":"Zeyang Song, Qi Liu, Qu Yang, Haizhou Li","doi":"10.21437/interspeech.2022-633","DOIUrl":null,"url":null,"abstract":"We study a light-weight implementation of keyword spotting (KWS) for voice command and control, that can be implemented on an in-memory computing (IMC) unit with same accuracy at a lower computational cost than the state-of-the-art methods. KWS is expected to be always-on for mobile devices with limited resources. IMC represents one of the solutions. However, it only supports multiplication-accumulation and Boolean operations. We note that common feature extraction methods, such as MFCC and SincConv, are not supported by IMC as they depend on expensive logarithm computing. On the other hand, some neural network solutions to KWS involve a large number of parameters that are not feasible for mobile devices. In this work, we propose a knowledge distillation technique to replace the complex speech frontend like MFCC or SincConv with a light-weight encoder without performance loss. Experiments show that the proposed model outperforms the KWS model with MFCC and SincConv front-end in terms of accuracy and computational cost.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"4128-4132"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Knowledge distillation for In-memory keyword spotting model\",\"authors\":\"Zeyang Song, Qi Liu, Qu Yang, Haizhou Li\",\"doi\":\"10.21437/interspeech.2022-633\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study a light-weight implementation of keyword spotting (KWS) for voice command and control, that can be implemented on an in-memory computing (IMC) unit with same accuracy at a lower computational cost than the state-of-the-art methods. KWS is expected to be always-on for mobile devices with limited resources. IMC represents one of the solutions. However, it only supports multiplication-accumulation and Boolean operations. We note that common feature extraction methods, such as MFCC and SincConv, are not supported by IMC as they depend on expensive logarithm computing. On the other hand, some neural network solutions to KWS involve a large number of parameters that are not feasible for mobile devices. In this work, we propose a knowledge distillation technique to replace the complex speech frontend like MFCC or SincConv with a light-weight encoder without performance loss. Experiments show that the proposed model outperforms the KWS model with MFCC and SincConv front-end in terms of accuracy and computational cost.\",\"PeriodicalId\":73500,\"journal\":{\"name\":\"Interspeech\",\"volume\":\"1 1\",\"pages\":\"4128-4132\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Interspeech\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/interspeech.2022-633\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-633","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

我们研究了一种用于语音命令和控制的关键词识别（KWS）的轻量级实现，该实现可以在内存计算（IMC）单元上以与最先进的方法相比更低的计算成本实现，具有相同的精度。对于资源有限的移动设备，KWS预计将一直处于开启状态。IMC代表了其中一种解决方案。但是，它只支持乘法累加和布尔运算。我们注意到，常见的特征提取方法，如MFCC和SincConv，不受IMC的支持，因为它们依赖于昂贵的对数计算。另一方面，KWS的一些神经网络解决方案涉及大量参数，这些参数对于移动设备来说是不可行的。在这项工作中，我们提出了一种知识提取技术，用一种没有性能损失的轻量级编码器来取代像MFCC或SincConv这样的复杂语音前端。实验表明，该模型在精度和计算成本方面优于具有MFCC和SincConv前端的KWS模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Knowledge distillation for In-memory keyword spotting model

We study a light-weight implementation of keyword spotting (KWS) for voice command and control, that can be implemented on an in-memory computing (IMC) unit with same accuracy at a lower computational cost than the state-of-the-art methods. KWS is expected to be always-on for mobile devices with limited resources. IMC represents one of the solutions. However, it only supports multiplication-accumulation and Boolean operations. We note that common feature extraction methods, such as MFCC and SincConv, are not supported by IMC as they depend on expensive logarithm computing. On the other hand, some neural network solutions to KWS involve a large number of parameters that are not feasible for mobile devices. In this work, we propose a knowledge distillation technique to replace the complex speech frontend like MFCC or SincConv with a light-weight encoder without performance loss. Experiments show that the proposed model outperforms the KWS model with MFCC and SincConv front-end in terms of accuracy and computational cost.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Interspeech

自引率

0.00%

发文量