基于关键字规范化的自发语音关键字识别

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI:10.1109/ISCSLP.2012.6423490

Weifeng Li, Q. Liao

{"title":"基于关键字规范化的自发语音关键字识别","authors":"Weifeng Li, Q. Liao","doi":"10.1109/ISCSLP.2012.6423490","DOIUrl":null,"url":null,"abstract":"This paper presents a novel architecture for keyword spotting in spontaneous speech, in which keyword model is trained from a small number of acoustic examples provided by a user. The word-spotting architecture relies on scoring patch feature vector sequences extracted by using sliding windows, and performing keyword-specific normalization and threshold setting. Dynamic time warping (DTW) based template matching and Gaussian Mixture Models (GMM) are proposed to model the keyword, and another GMM is proposed to model the non-keywords. Our keyword spotting experiments demonstrate the effectiveness of the proposed methods. More specifically, the proposed GMM log-likelihood ratio based method achieves about 17% absolute improvement in terms of recall rates compared to the baseline system.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Keyword-specific normalization based keyword spotting for spontaneous speech\",\"authors\":\"Weifeng Li, Q. Liao\",\"doi\":\"10.1109/ISCSLP.2012.6423490\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a novel architecture for keyword spotting in spontaneous speech, in which keyword model is trained from a small number of acoustic examples provided by a user. The word-spotting architecture relies on scoring patch feature vector sequences extracted by using sliding windows, and performing keyword-specific normalization and threshold setting. Dynamic time warping (DTW) based template matching and Gaussian Mixture Models (GMM) are proposed to model the keyword, and another GMM is proposed to model the non-keywords. Our keyword spotting experiments demonstrate the effectiveness of the proposed methods. More specifically, the proposed GMM log-likelihood ratio based method achieves about 17% absolute improvement in terms of recall rates compared to the baseline system.\",\"PeriodicalId\":186099,\"journal\":{\"name\":\"2012 8th International Symposium on Chinese Spoken Language Processing\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 8th International Symposium on Chinese Spoken Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCSLP.2012.6423490\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 8th International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP.2012.6423490","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

本文提出了一种基于用户提供的少量声学样本来训练关键字模型的自然语音关键字识别新架构。单词识别体系结构依赖于对滑动窗口提取的补丁特征向量序列进行评分，并执行特定关键字的归一化和阈值设置。提出了基于动态时间规整(DTW)的模板匹配和高斯混合模型(GMM)对关键字进行建模，并提出了基于高斯混合模型对非关键字进行建模。我们的关键词识别实验证明了所提方法的有效性。更具体地说，与基线系统相比，所提出的基于GMM对数似然比的方法在召回率方面实现了约17%的绝对提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Keyword-specific normalization based keyword spotting for spontaneous speech

This paper presents a novel architecture for keyword spotting in spontaneous speech, in which keyword model is trained from a small number of acoustic examples provided by a user. The word-spotting architecture relies on scoring patch feature vector sequences extracted by using sliding windows, and performing keyword-specific normalization and threshold setting. Dynamic time warping (DTW) based template matching and Gaussian Mixture Models (GMM) are proposed to model the keyword, and another GMM is proposed to model the non-keywords. Our keyword spotting experiments demonstrate the effectiveness of the proposed methods. More specifically, the proposed GMM log-likelihood ratio based method achieves about 17% absolute improvement in terms of recall rates compared to the baseline system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 8th International Symposium on Chinese Spoken Language Processing

自引率

0.00%

发文量