A hierarchical system for word discovery exploiting DTW-based initialization

2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI:10.1109/ASRU.2013.6707761

Oliver Walter, Timo Korthals, Reinhold Häb-Umbach, B. Raj

{"title":"A hierarchical system for word discovery exploiting DTW-based initialization","authors":"Oliver Walter, Timo Korthals, Reinhold Häb-Umbach, B. Raj","doi":"10.1109/ASRU.2013.6707761","DOIUrl":null,"url":null,"abstract":"Discovering the linguistic structure of a language solely from spoken input asks for two steps: phonetic and lexical discovery. The first is concerned with identifying the categorical subword unit inventory and relating it to the underlying acoustics, while the second aims at discovering words as repeated patterns of subword units. The hierarchical approach presented here accounts for classification errors in the first stage by modelling the pronunciation of a word in terms of subword units probabilistically: a hidden Markov model with discrete emission probabilities, emitting the observed subword unit sequences. We describe how the system can be learned in a completely unsupervised fashion from spoken input. To improve the initialization of the training of the word pronunciations, the output of a dynamic time warping based acoustic pattern discovery system is used, as it is able to discover similar temporal sequences in the input data. This improved initialization, using only weak supervision, has led to a 40% reduction in word error rate on a digit recognition task.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2013.6707761","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 38

Abstract

Discovering the linguistic structure of a language solely from spoken input asks for two steps: phonetic and lexical discovery. The first is concerned with identifying the categorical subword unit inventory and relating it to the underlying acoustics, while the second aims at discovering words as repeated patterns of subword units. The hierarchical approach presented here accounts for classification errors in the first stage by modelling the pronunciation of a word in terms of subword units probabilistically: a hidden Markov model with discrete emission probabilities, emitting the observed subword unit sequences. We describe how the system can be learned in a completely unsupervised fashion from spoken input. To improve the initialization of the training of the word pronunciations, the output of a dynamic time warping based acoustic pattern discovery system is used, as it is able to discover similar temporal sequences in the input data. This improved initialization, using only weak supervision, has led to a 40% reduction in word error rate on a digit recognition task.

查看原文本刊更多论文

利用基于dwt的初始化进行单词发现的分层系统

仅从语音输入中发现语言结构需要两个步骤:语音发现和词汇发现。第一种方法是识别分类子词单元清单并将其与潜在声学联系起来，而第二种方法旨在发现作为子词单元重复模式的单词。本文提出的分层方法通过根据子词单元概率建模单词的发音来解释第一阶段的分类错误:一个具有离散发射概率的隐马尔可夫模型，发射观察到的子词单元序列。我们描述了系统如何以完全无监督的方式从语音输入中学习。为了改善单词发音训练的初始化，使用了基于动态时间翘曲的声学模式发现系统的输出，因为它能够在输入数据中发现相似的时间序列。这种改进的初始化，只使用弱监督，导致数字识别任务中的单词错误率降低了40%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE Workshop on Automatic Speech Recognition and Understanding

自引率

0.00%

发文量