Yuzhong Jiao, Yiu Kei Li, C. Chan, Yun Li, Zhilin Ai
{"title":"Log mel温度计代码-基于bnn的关键词识别系统的频谱系数","authors":"Yuzhong Jiao, Yiu Kei Li, C. Chan, Yun Li, Zhilin Ai","doi":"10.1109/APCCAS55924.2022.10090286","DOIUrl":null,"url":null,"abstract":"For keyword spotting (KWS) systems that usually work in mobile devices, a low-complexity design is essential for long stand-by time. Audio feature extraction and classifier modeling are the two main components of KWS systems. Log Mel-Frequency Spectral Coefficient (MFSC) is common for audio feature extraction due to its low complexity and good performance. Binary neural network (BNN) classifier, which owns binary weights and activations and performs convolution with XNOR, is applicable to low-complexity KWS applications. However, audio features are usually quantized with multiple-bit binary code to maintain high classification accuracy, which requires addition (ADD) operations in the first convolutional layer of the BNN model. Therefore, both XNOR and ADD units are needed in the BNN accelerator. To further reduce the complexity of KWS systems, we propose a new feature extraction method: Thermometer Codes of MFSC (MFSC-TC). Without LOG and DELTA operations, it is simpler than other MFSC-based methods. More importantly, convolution of all layers can be done by XNOR units due to the feature of thermometer code. The experiments with the Google Speech Commands dataset validate that the MFSC-TC-based BNN models outperform the models with more layers using other feature extraction methods.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Thermometer Code of Log Mel-Frequency Spectral Coefficient for BNN-based Keyword Spotting System\",\"authors\":\"Yuzhong Jiao, Yiu Kei Li, C. Chan, Yun Li, Zhilin Ai\",\"doi\":\"10.1109/APCCAS55924.2022.10090286\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For keyword spotting (KWS) systems that usually work in mobile devices, a low-complexity design is essential for long stand-by time. Audio feature extraction and classifier modeling are the two main components of KWS systems. Log Mel-Frequency Spectral Coefficient (MFSC) is common for audio feature extraction due to its low complexity and good performance. Binary neural network (BNN) classifier, which owns binary weights and activations and performs convolution with XNOR, is applicable to low-complexity KWS applications. However, audio features are usually quantized with multiple-bit binary code to maintain high classification accuracy, which requires addition (ADD) operations in the first convolutional layer of the BNN model. Therefore, both XNOR and ADD units are needed in the BNN accelerator. To further reduce the complexity of KWS systems, we propose a new feature extraction method: Thermometer Codes of MFSC (MFSC-TC). Without LOG and DELTA operations, it is simpler than other MFSC-based methods. More importantly, convolution of all layers can be done by XNOR units due to the feature of thermometer code. The experiments with the Google Speech Commands dataset validate that the MFSC-TC-based BNN models outperform the models with more layers using other feature extraction methods.\",\"PeriodicalId\":243739,\"journal\":{\"name\":\"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APCCAS55924.2022.10090286\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APCCAS55924.2022.10090286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Thermometer Code of Log Mel-Frequency Spectral Coefficient for BNN-based Keyword Spotting System
For keyword spotting (KWS) systems that usually work in mobile devices, a low-complexity design is essential for long stand-by time. Audio feature extraction and classifier modeling are the two main components of KWS systems. Log Mel-Frequency Spectral Coefficient (MFSC) is common for audio feature extraction due to its low complexity and good performance. Binary neural network (BNN) classifier, which owns binary weights and activations and performs convolution with XNOR, is applicable to low-complexity KWS applications. However, audio features are usually quantized with multiple-bit binary code to maintain high classification accuracy, which requires addition (ADD) operations in the first convolutional layer of the BNN model. Therefore, both XNOR and ADD units are needed in the BNN accelerator. To further reduce the complexity of KWS systems, we propose a new feature extraction method: Thermometer Codes of MFSC (MFSC-TC). Without LOG and DELTA operations, it is simpler than other MFSC-based methods. More importantly, convolution of all layers can be done by XNOR units due to the feature of thermometer code. The experiments with the Google Speech Commands dataset validate that the MFSC-TC-based BNN models outperform the models with more layers using other feature extraction methods.