{"title":"Incorporating syllable duration into line-detection-based spoken term detection","authors":"Teppei Ohno, T. Akiba","doi":"10.1109/SLT.2012.6424223","DOIUrl":null,"url":null,"abstract":"A conventional method for spoken term detection (STD) is to apply approximate string matching to subword sequences in a spoken document obtained by speech recognition. An STD method that considers string matching as line detection in a syllable distance plane has been proposed. While this has demonstrated fast ordered-by-distance detections, it has still suffered from the insertion and deletion errors introduced by the speech recognition. In this work, we aim to improve detection performance by employing syllable-duration information. The proposed method enables robust detection by introducing a distance plane that uses frames as units instead of using syllables as units. Our experimental evaluation showed that the incorporation of syllable-duration achieved higher detection performance in high-recall regions.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2012.6424223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
A conventional method for spoken term detection (STD) is to apply approximate string matching to subword sequences in a spoken document obtained by speech recognition. An STD method that considers string matching as line detection in a syllable distance plane has been proposed. While this has demonstrated fast ordered-by-distance detections, it has still suffered from the insertion and deletion errors introduced by the speech recognition. In this work, we aim to improve detection performance by employing syllable-duration information. The proposed method enables robust detection by introducing a distance plane that uses frames as units instead of using syllables as units. Our experimental evaluation showed that the incorporation of syllable-duration achieved higher detection performance in high-recall regions.