{"title":"使用文本查询的语音记录中的单词检测","authors":"Lukasz Laszko","doi":"10.15439/2015F341","DOIUrl":null,"url":null,"abstract":"The paper presents unsupervised method for word detection in recorded spoken language signal. The method is based on examining signal similarity of two analyzed media description: registered voice and a word (textual query) synthesized by using Text-to-Speech tools. The descriptions of media were given by a sequence of Mel-Frequency Cepstral Coefficients or Human-Factor Cepstral Coefficients. Dynamic Time Warping algorithm has been applied to provide time alignment of the given media description. The detection involved classification method based on cost function, calculated upon signal similarity and alignment path. Potential false matches were eliminated in the algorithm by comparing costs of the path subsequences to a threshold value. The results of the work could provide incentives to build affordable commercial or non-commercial solutions for specific and multilingual applications.","PeriodicalId":276884,"journal":{"name":"2015 Federated Conference on Computer Science and Information Systems (FedCSIS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Word detection in recorded speech using textual queries\",\"authors\":\"Lukasz Laszko\",\"doi\":\"10.15439/2015F341\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper presents unsupervised method for word detection in recorded spoken language signal. The method is based on examining signal similarity of two analyzed media description: registered voice and a word (textual query) synthesized by using Text-to-Speech tools. The descriptions of media were given by a sequence of Mel-Frequency Cepstral Coefficients or Human-Factor Cepstral Coefficients. Dynamic Time Warping algorithm has been applied to provide time alignment of the given media description. The detection involved classification method based on cost function, calculated upon signal similarity and alignment path. Potential false matches were eliminated in the algorithm by comparing costs of the path subsequences to a threshold value. The results of the work could provide incentives to build affordable commercial or non-commercial solutions for specific and multilingual applications.\",\"PeriodicalId\":276884,\"journal\":{\"name\":\"2015 Federated Conference on Computer Science and Information Systems (FedCSIS)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 Federated Conference on Computer Science and Information Systems (FedCSIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15439/2015F341\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Federated Conference on Computer Science and Information Systems (FedCSIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15439/2015F341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Word detection in recorded speech using textual queries
The paper presents unsupervised method for word detection in recorded spoken language signal. The method is based on examining signal similarity of two analyzed media description: registered voice and a word (textual query) synthesized by using Text-to-Speech tools. The descriptions of media were given by a sequence of Mel-Frequency Cepstral Coefficients or Human-Factor Cepstral Coefficients. Dynamic Time Warping algorithm has been applied to provide time alignment of the given media description. The detection involved classification method based on cost function, calculated upon signal similarity and alignment path. Potential false matches were eliminated in the algorithm by comparing costs of the path subsequences to a threshold value. The results of the work could provide incentives to build affordable commercial or non-commercial solutions for specific and multilingual applications.