{"title":"Efficient Free Keyword Detection Based on CNN and End-to-End Continuous DP-Matching","authors":"Tomohiro Tanaka, T. Shinozaki","doi":"10.1109/ASRU46091.2019.9004021","DOIUrl":null,"url":null,"abstract":"For continuous keyword detection, the advantage of dynamic programming (DP) matching is that it can detect any keyword without re-training the system. In previous research, higher detection accuracy was reported using 2D-RNN based DP matching than using conventional DP and embedding methods. However, 2D-RNN based DP matching has a high computational cost. In order to address this problem, we combine a convolutional neural network (CNN) and 2D-RNN based DP matching into a unified framework which, based on the kernel size and the number of CNN layers, has a polynomial order effect on reducing the computational cost. Experimental results, using Google Speech Commands Dataset and the CHiME-3 challenge's noise data, demonstrate that our proposed model improves open keyword detection performance, compared to the embedding-based baseline system, while it is nine times faster than previous 2D-RNN DP matching.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"306 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9004021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
For continuous keyword detection, the advantage of dynamic programming (DP) matching is that it can detect any keyword without re-training the system. In previous research, higher detection accuracy was reported using 2D-RNN based DP matching than using conventional DP and embedding methods. However, 2D-RNN based DP matching has a high computational cost. In order to address this problem, we combine a convolutional neural network (CNN) and 2D-RNN based DP matching into a unified framework which, based on the kernel size and the number of CNN layers, has a polynomial order effect on reducing the computational cost. Experimental results, using Google Speech Commands Dataset and the CHiME-3 challenge's noise data, demonstrate that our proposed model improves open keyword detection performance, compared to the embedding-based baseline system, while it is nine times faster than previous 2D-RNN DP matching.