Lian Meirong, Zhang Shaoying, Cheng Chuanxu, Xu Wen
{"title":"使用卷积递归神经网络和连接主义时间分类的逐例查询设备上关键字识别","authors":"Lian Meirong, Zhang Shaoying, Cheng Chuanxu, Xu Wen","doi":"10.1109/ICSP51882.2021.9408857","DOIUrl":null,"url":null,"abstract":"Keyword spotting (KWS) is an essential feature for speech-based applications on mobile devices. For the sake of reducing power consumption and improving robustness on substandard pronunciations of KWS systems, this paper proposes a query-by-example on-device keyword spotting system using Convolutional Recurrent Neural Network (CRNN) and Connectionist T emporal Classification (CTC). CRNN is to directly predict the phoneme posterior probabilities, and CTC is to calculate the scores for the output phoneme sequences. To reduce the computational costs, the CRNN-based model is then simplified, and a template generator is built for generating keyword templates based on Dynamic Time Wrapper (DTW). The proposed KWS system has low computational requirements and is well-suited for both enrollment and inference on lower-power devices. It has competitive performance in comparison with other query-byexample systems, and has achieved the standards of the commercial application level, even in the condition of noise or under far-field environment.","PeriodicalId":117159,"journal":{"name":"2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Query-by-Example on-Device Keyword Spotting using Convolutional Recurrent Neural Network and Connectionist Temporal Classification\",\"authors\":\"Lian Meirong, Zhang Shaoying, Cheng Chuanxu, Xu Wen\",\"doi\":\"10.1109/ICSP51882.2021.9408857\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Keyword spotting (KWS) is an essential feature for speech-based applications on mobile devices. For the sake of reducing power consumption and improving robustness on substandard pronunciations of KWS systems, this paper proposes a query-by-example on-device keyword spotting system using Convolutional Recurrent Neural Network (CRNN) and Connectionist T emporal Classification (CTC). CRNN is to directly predict the phoneme posterior probabilities, and CTC is to calculate the scores for the output phoneme sequences. To reduce the computational costs, the CRNN-based model is then simplified, and a template generator is built for generating keyword templates based on Dynamic Time Wrapper (DTW). The proposed KWS system has low computational requirements and is well-suited for both enrollment and inference on lower-power devices. It has competitive performance in comparison with other query-byexample systems, and has achieved the standards of the commercial application level, even in the condition of noise or under far-field environment.\",\"PeriodicalId\":117159,\"journal\":{\"name\":\"2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSP51882.2021.9408857\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSP51882.2021.9408857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Query-by-Example on-Device Keyword Spotting using Convolutional Recurrent Neural Network and Connectionist Temporal Classification
Keyword spotting (KWS) is an essential feature for speech-based applications on mobile devices. For the sake of reducing power consumption and improving robustness on substandard pronunciations of KWS systems, this paper proposes a query-by-example on-device keyword spotting system using Convolutional Recurrent Neural Network (CRNN) and Connectionist T emporal Classification (CTC). CRNN is to directly predict the phoneme posterior probabilities, and CTC is to calculate the scores for the output phoneme sequences. To reduce the computational costs, the CRNN-based model is then simplified, and a template generator is built for generating keyword templates based on Dynamic Time Wrapper (DTW). The proposed KWS system has low computational requirements and is well-suited for both enrollment and inference on lower-power devices. It has competitive performance in comparison with other query-byexample systems, and has achieved the standards of the commercial application level, even in the condition of noise or under far-field environment.