{"title":"Using rhythmic features for Japanese spoken term detection","authors":"Naoyuki Kanda, Ryu Takeda, Y. Obuchi","doi":"10.1109/SLT.2012.6424217","DOIUrl":null,"url":null,"abstract":"A new rescoring method for spoken term detection (STD) is proposed. Phoneme-based close-matching techniques have been used because of their ability to detect out-of-vocabulary (OOV) queries. To improve the accuracy of phoneme-based techniques, rescoring techniques have been used to accurately re-rank the results from phoneme-based close-matching; however, conventional rescoring techniques based on an utterance verification model still produce many false detection results. To further improve the accuracy, in this study, several features representing the “naturalness” (or “abnormality”) of duration of phonemes/syllables in detected candidates of a keyword are proposed. These features are incorporated into a conventional rescoring technique using logistic regression. Experimental results with a 604-hour Japanese speech corpus indicated that combining the rhythmic features achieved a further relative error reduction of 8.9% compared to a conventional rescoring technique.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2012.6424217","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
A new rescoring method for spoken term detection (STD) is proposed. Phoneme-based close-matching techniques have been used because of their ability to detect out-of-vocabulary (OOV) queries. To improve the accuracy of phoneme-based techniques, rescoring techniques have been used to accurately re-rank the results from phoneme-based close-matching; however, conventional rescoring techniques based on an utterance verification model still produce many false detection results. To further improve the accuracy, in this study, several features representing the “naturalness” (or “abnormality”) of duration of phonemes/syllables in detected candidates of a keyword are proposed. These features are incorporated into a conventional rescoring technique using logistic regression. Experimental results with a 604-hour Japanese speech corpus indicated that combining the rhythmic features achieved a further relative error reduction of 8.9% compared to a conventional rescoring technique.