P. Naik, M. Gaonkar, Veena Thenkanidiyoor, A. D. Dileep
{"title":"Kernel based Matching and a Novel training approach for CNN-based QbE-STD","authors":"P. Naik, M. Gaonkar, Veena Thenkanidiyoor, A. D. Dileep","doi":"10.1109/SPCOM50965.2020.9179588","DOIUrl":null,"url":null,"abstract":"Query-by-Example based spoken term detection (QbE-STD) to audio search involves matching an audio query with the reference utterances to find the relevant utterances. QbE-STD involves computing a matching matrix between a query and reference utterance using a suitable metric. In this work we propose to use kernel based matching by considering histogram intersection kernel (HIK) as a matching metric. A CNN-based approach to QbE-STD involves first converting a matching matrix to a corresponding size-normalized image and classifying the image as relevant or not [6]. In this work, we propose to train a CNN-based classifier using size-normalized images instead of splitting them into subimages as in [6]. Training approach proposed in this work is expected to be more effective since there is less chance of a CNN based classifier getting confused. The effectiveness of the proposed kernel based matching and novel training approach is studied using TIMIT dataset.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Signal Processing and Communications (SPCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM50965.2020.9179588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Query-by-Example based spoken term detection (QbE-STD) to audio search involves matching an audio query with the reference utterances to find the relevant utterances. QbE-STD involves computing a matching matrix between a query and reference utterance using a suitable metric. In this work we propose to use kernel based matching by considering histogram intersection kernel (HIK) as a matching metric. A CNN-based approach to QbE-STD involves first converting a matching matrix to a corresponding size-normalized image and classifying the image as relevant or not [6]. In this work, we propose to train a CNN-based classifier using size-normalized images instead of splitting them into subimages as in [6]. Training approach proposed in this work is expected to be more effective since there is less chance of a CNN based classifier getting confused. The effectiveness of the proposed kernel based matching and novel training approach is studied using TIMIT dataset.