{"title":"用于逐例查询搜索记录的噪声鲁棒样本匹配","authors":"Emre Yilmaz, Julien van Hout, H. Franco","doi":"10.1109/ASRU.2017.8268909","DOIUrl":null,"url":null,"abstract":"This paper describes a two-step approach for keyword spotting task in which a query-by-example (QbE) search is followed by noise robust exemplar matching (N-REM) rescoring. In the first stage, subsequence dynamic time warping is performed to detect keywords in search utterances. In the second stage, these target frame sequences are rescored using the reconstruction errors provided by the linear combination of the available exemplars extracted from the training data. Due to data sparsity, we align the target frame sequence and the exemplars to a common frame length and the exemplar weights are obtained by solving a convex optimization problem with nonnegative sparse coding. We run keyword spotting experiments on the Air Traffic Control (ATC) database and evaluate performance of multiple distance metrics for calculating the weights and reconstruction errors using convolutional neural network (CNN) bottleneck features. The results demonstrate that the proposed two-step keyword spotting approach provides better keyword detection compared to a baseline with only QbE search.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Noise-robust exemplar matching for rescoring query-by-example search\",\"authors\":\"Emre Yilmaz, Julien van Hout, H. Franco\",\"doi\":\"10.1109/ASRU.2017.8268909\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a two-step approach for keyword spotting task in which a query-by-example (QbE) search is followed by noise robust exemplar matching (N-REM) rescoring. In the first stage, subsequence dynamic time warping is performed to detect keywords in search utterances. In the second stage, these target frame sequences are rescored using the reconstruction errors provided by the linear combination of the available exemplars extracted from the training data. Due to data sparsity, we align the target frame sequence and the exemplars to a common frame length and the exemplar weights are obtained by solving a convex optimization problem with nonnegative sparse coding. We run keyword spotting experiments on the Air Traffic Control (ATC) database and evaluate performance of multiple distance metrics for calculating the weights and reconstruction errors using convolutional neural network (CNN) bottleneck features. The results demonstrate that the proposed two-step keyword spotting approach provides better keyword detection compared to a baseline with only QbE search.\",\"PeriodicalId\":290868,\"journal\":{\"name\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2017.8268909\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268909","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Noise-robust exemplar matching for rescoring query-by-example search
This paper describes a two-step approach for keyword spotting task in which a query-by-example (QbE) search is followed by noise robust exemplar matching (N-REM) rescoring. In the first stage, subsequence dynamic time warping is performed to detect keywords in search utterances. In the second stage, these target frame sequences are rescored using the reconstruction errors provided by the linear combination of the available exemplars extracted from the training data. Due to data sparsity, we align the target frame sequence and the exemplars to a common frame length and the exemplar weights are obtained by solving a convex optimization problem with nonnegative sparse coding. We run keyword spotting experiments on the Air Traffic Control (ATC) database and evaluate performance of multiple distance metrics for calculating the weights and reconstruction errors using convolutional neural network (CNN) bottleneck features. The results demonstrate that the proposed two-step keyword spotting approach provides better keyword detection compared to a baseline with only QbE search.