2013 IEEE Workshop on Automatic Speech Recognition and Understanding最新文献_第6页

Unsupervised induction and filling of semantic slots for spoken dialogue systems using frame-semantic parsing 基于框架语义分析的口语对话系统语义槽的无监督归纳和填充

2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707716

Yun-Nung (Vivian) Chen, William Yang Wang, Alexander I. Rudnicky

引用次数: 88

DNN acoustic modeling with modular multi-lingual feature extraction networks 基于模块化多语言特征提取网络的深度神经网络声学建模

2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707754

Jonas Gehring, Quoc Bao Nguyen, Florian Metze, A. Waibel

{"title":"DNN acoustic modeling with modular multi-lingual feature extraction networks","authors":"Jonas Gehring, Quoc Bao Nguyen, Florian Metze, A. Waibel","doi":"10.1109/ASRU.2013.6707754","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707754","url":null,"abstract":"In this work, we propose several deep neural network architectures that are able to leverage data from multiple languages. Modularity is achieved by training networks for extracting high-level features and for estimating phoneme state posteriors separately, and then combining them for decoding in a hybrid DNN/HMM setup. This approach has been shown to achieve superior performance for single-language systems, and here we demonstrate that feature extractors benefit significantly from being trained as multi-lingual networks with shared hidden representations. We also show that existing mono-lingual networks can be re-used in a modular fashion to achieve a similar level of performance without having to train new networks on multi-lingual data. Furthermore, we investigate in extending these architectures to make use of language-specific acoustic features. Evaluations are performed on a low-resource conversational telephone speech transcription task in Vietnamese, while additional data for acoustic model training is provided in Pashto, Tagalog, Turkish, and Cantonese. Improvements of up to 17.4% and 13.8% over mono-lingual GMMs and DNNs, respectively, are obtained.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125431380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings 低资源环境下可变长度段的固定维声学嵌入

2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707765

Keith D. Levin, Katharine Henry, A. Jansen, Karen Livescu

{"title":"Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings","authors":"Keith D. Levin, Katharine Henry, A. Jansen, Karen Livescu","doi":"10.1109/ASRU.2013.6707765","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707765","url":null,"abstract":"Measures of acoustic similarity between words or other units are critical for segmental exemplar-based acoustic models, spoken term discovery, and query-by-example search. Dynamic time warping (DTW) alignment cost has been the most commonly used measure, but it has well-known inadequacies. Some recently proposed alternatives require large amounts of training data. In the interest of finding more efficient, accurate, and low-resource alternatives, we consider the problem of embedding speech segments of arbitrary length into fixed-dimensional spaces in which simple distances (such as cosine or Euclidean) serve as a proxy for linguistically meaningful (phonetic, lexical, etc.) dissimilarities. Such embeddings would enable efficient audio indexing and permit application of standard distance learning techniques to segmental acoustic modeling. In this paper, we explore several supervised and unsupervised approaches to this problem and evaluate them on an acoustic word discrimination task. We identify several embedding algorithms that match or improve upon the DTW baseline in low-resource settings.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"645 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120876196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 120

Semi-supervised training of Deep Neural Networks 深度神经网络的半监督训练

2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707741

Karel Veselý, M. Hannemann, L. Burget

引用次数: 137

Effective pseudo-relevance feedback for language modeling in speech recognition 语音识别中有效的伪相关反馈语言建模

2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707698

Berlin Chen, Yi-Wen Chen, Kuan-Yu Chen, E. Jan

{"title":"Effective pseudo-relevance feedback for language modeling in speech recognition","authors":"Berlin Chen, Yi-Wen Chen, Kuan-Yu Chen, E. Jan","doi":"10.1109/ASRU.2013.6707698","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707698","url":null,"abstract":"A part and parcel of any automatic speech recognition (ASR) system is language modeling (LM), which helps to constrain the acoustic analysis, guide the search through multiple candidate word strings, and quantify the acceptability of the final output hypothesis given an input utterance. Despite the fact that the n-gram model remains the predominant one, a number of novel and ingenious LM methods have been developed to complement or be used in place of the n-gram model. A more recent line of research is to leverage information cues gleaned from pseudo-relevance feedback (PRF) to derive an utterance-regularized language model for complementing the n-gram model. This paper presents a continuation of this general line of research and its main contribution is two-fold. First, we explore an alternative and more efficient formulation to construct such an utterance-regularized language model for ASR. Second, the utilities of various utterance-regularized language models are analyzed and compared extensively. Empirical experiments on a large vocabulary continuous speech recognition (LVCSR) task demonstrate that our proposed language models can offer substantial improvements over the baseline n-gram system, and achieve performance competitive to, or better than, some state-of-the-art language models.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"215 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131620207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semantic entity detection from multiple ASR hypotheses within the WFST framework 在WFST框架内从多个ASR假设中进行语义实体检测

2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707710

J. Svec, P. Ircing, L. Smídl

{"title":"Semantic entity detection from multiple ASR hypotheses within the WFST framework","authors":"J. Svec, P. Ircing, L. Smídl","doi":"10.1109/ASRU.2013.6707710","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707710","url":null,"abstract":"The paper presents a novel approach to named entity detection from ASR lattices. Since the described method not only detects the named entities but also assigns a detailed semantic interpretation to them, we call our approach the semantic entity detection. All the algorithms are designed to use automata operations defined within the framework of weighted finite state transducers (WFST) - the ASR lattices are nowadays frequently represented as weighted acceptors. The expert knowledge about the semantics of the task at hand can be first expressed in the form of a context free grammar and then converted to the FST form. We use a WFST optimization to obtain compact representation of the ASR lattice. The WFST framework also allows to use the word confusion networks as another representation of multiple ASR hypotheses. That way we can use the full power of composition and optimization operations implemented in the OpenFST toolkit for our semantic entity detection algorithm. The devised method also employs the concept of a factor automaton; this approach allows us to overcome the need for a filler model and consequently makes the method more general. The paper includes experimental evaluation of the proposed algorithm and compares the performance obtained by using the one-best word hypothesis, optimized lattices and word confusion networks.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124045880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Learning better lexical properties for recurrent OOV words 学习更好的反复出现的OOV单词的词汇特性

2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707699

Longlu Qin, Alexander I. Rudnicky

引用次数: 4

Compact acoustic modeling based on acoustic manifold using a mixture of factor analyzers 基于混合因子分析的声流形紧凑声学建模

2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707702

Wenlin Zhang, Bi-cheng Li, Weiqiang Zhang

{"title":"Compact acoustic modeling based on acoustic manifold using a mixture of factor analyzers","authors":"Wenlin Zhang, Bi-cheng Li, Weiqiang Zhang","doi":"10.1109/ASRU.2013.6707702","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707702","url":null,"abstract":"A compact acoustic model for speech recognition is proposed based on nonlinear manifold modeling of the acoustic feature space. Acoustic features of the speech signal is assumed to form a low-dimensional manifold, which is modeled by a mixture of factor analyzers. Each factor analyzer describes a local area of the manifold using a low-dimensional linear model. For an HMM-based speech recognition system, observations of a particular state are constrained to be located on part of the manifold, which may cover several factor analyzers. For each tied-state, a sparse weight vector is obtained through an iteration shrinkage algorithm, in which the sparseness is determined automatically by the training data. For each nonzero component of the weight vector, a low-dimensional factor is estimated for the corresponding factor model according to the maximum a posteriori (MAP) criterion, resulting in a compact state model. Experimental results show that compared with the conventional HMM-GMM system and the SGMM system, the new method not only contains fewer parameters, but also yields better recognition results.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129363371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Discriminative semi-supervised training for keyword search in low resource languages 针对低资源语言关键词搜索的判别式半监督训练

2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707770

Roger Hsiao, Tim Ng, F. Grézl, D. Karakos, S. Tsakalidis, L. Nguyen, R. Schwartz

引用次数: 25

An empirical study of confusion modeling in keyword search for low resource languages 低资源语言关键词搜索中的混淆建模实证研究

2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707774

M. Saraçlar, A. Sethy, B. Ramabhadran, L. Mangu, Jia Cui, Xiaodong Cui, Brian Kingsbury, Jonathan Mamou

引用次数: 42