{"title":"基于CNN和端到端连续dp匹配的高效免费关键字检测","authors":"Tomohiro Tanaka, T. Shinozaki","doi":"10.1109/ASRU46091.2019.9004021","DOIUrl":null,"url":null,"abstract":"For continuous keyword detection, the advantage of dynamic programming (DP) matching is that it can detect any keyword without re-training the system. In previous research, higher detection accuracy was reported using 2D-RNN based DP matching than using conventional DP and embedding methods. However, 2D-RNN based DP matching has a high computational cost. In order to address this problem, we combine a convolutional neural network (CNN) and 2D-RNN based DP matching into a unified framework which, based on the kernel size and the number of CNN layers, has a polynomial order effect on reducing the computational cost. Experimental results, using Google Speech Commands Dataset and the CHiME-3 challenge's noise data, demonstrate that our proposed model improves open keyword detection performance, compared to the embedding-based baseline system, while it is nine times faster than previous 2D-RNN DP matching.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"306 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Efficient Free Keyword Detection Based on CNN and End-to-End Continuous DP-Matching\",\"authors\":\"Tomohiro Tanaka, T. Shinozaki\",\"doi\":\"10.1109/ASRU46091.2019.9004021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For continuous keyword detection, the advantage of dynamic programming (DP) matching is that it can detect any keyword without re-training the system. In previous research, higher detection accuracy was reported using 2D-RNN based DP matching than using conventional DP and embedding methods. However, 2D-RNN based DP matching has a high computational cost. In order to address this problem, we combine a convolutional neural network (CNN) and 2D-RNN based DP matching into a unified framework which, based on the kernel size and the number of CNN layers, has a polynomial order effect on reducing the computational cost. Experimental results, using Google Speech Commands Dataset and the CHiME-3 challenge's noise data, demonstrate that our proposed model improves open keyword detection performance, compared to the embedding-based baseline system, while it is nine times faster than previous 2D-RNN DP matching.\",\"PeriodicalId\":150913,\"journal\":{\"name\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"306 3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU46091.2019.9004021\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9004021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient Free Keyword Detection Based on CNN and End-to-End Continuous DP-Matching
For continuous keyword detection, the advantage of dynamic programming (DP) matching is that it can detect any keyword without re-training the system. In previous research, higher detection accuracy was reported using 2D-RNN based DP matching than using conventional DP and embedding methods. However, 2D-RNN based DP matching has a high computational cost. In order to address this problem, we combine a convolutional neural network (CNN) and 2D-RNN based DP matching into a unified framework which, based on the kernel size and the number of CNN layers, has a polynomial order effect on reducing the computational cost. Experimental results, using Google Speech Commands Dataset and the CHiME-3 challenge's noise data, demonstrate that our proposed model improves open keyword detection performance, compared to the embedding-based baseline system, while it is nine times faster than previous 2D-RNN DP matching.