2011 IEEE Workshop on Automatic Speech Recognition & Understanding最新文献

筛选
英文 中文
Randomized maximum entropy language models 随机化最大熵语言模型
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163935
Puyang Xu, S. Khudanpur, A. Gunawardana
{"title":"Randomized maximum entropy language models","authors":"Puyang Xu, S. Khudanpur, A. Gunawardana","doi":"10.1109/ASRU.2011.6163935","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163935","url":null,"abstract":"We address the memory problem of maximum entropy language models(MELM) with very large feature sets. Randomized techniques are employed to remove all large, exact data structures in MELM implementations. To avoid the dictionary structure that maps each feature to its corresponding weight, the feature hashing trick [1] [2] can be used. We also replace the explicit storage of features with a Bloom filter. We show with extensive experiments that false positive errors of Bloom filters and random hash collisions do not degrade model performance. Both perplexity and WER improvements are demonstrated by building MELM that would otherwise be prohibitively large to estimate or store.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127112362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Efficient discriminative training of long-span language models 大跨度语言模型的高效判别训练
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163933
A. Rastrow, Mark Dredze, S. Khudanpur
{"title":"Efficient discriminative training of long-span language models","authors":"A. Rastrow, Mark Dredze, S. Khudanpur","doi":"10.1109/ASRU.2011.6163933","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163933","url":null,"abstract":"Long-span language models, such as those involving syntactic dependencies, produce more coherent text than their n-gram counterparts. However, evaluating the large number of sentence-hypotheses in a packed representation such as an ASR lattice is intractable under such long-span models both during decoding and discriminative training. The accepted compromise is to rescore only the N-best hypotheses in the lattice using the long-span LM. We present discriminative hill climbing, an efficient and effective discriminative training procedure for long-span LMs based on a hill climbing rescoring algorithm [1]. We empirically demonstrate significant computational savings as well as error-rate reduction over N-best training methods in a state of the art ASR system for Broadcast News transcription.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130715823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On-line policy optimisation of spoken dialogue systems via live interaction with human subjects 通过与人类受试者的实时互动,在线策略优化口语对话系统
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163950
Milica Gasic, Filip Jurcícek, Blaise Thomson, Kai Yu, S. Young
{"title":"On-line policy optimisation of spoken dialogue systems via live interaction with human subjects","authors":"Milica Gasic, Filip Jurcícek, Blaise Thomson, Kai Yu, S. Young","doi":"10.1109/ASRU.2011.6163950","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163950","url":null,"abstract":"Statistical dialogue models have required a large number of dialogues to optimise the dialogue policy, relying on the use of a simulated user. This results in a mismatch between training and live conditions, and significant development costs for the simulator thereby mitigating many of the claimed benefits of such models. Recent work on Gaussian process reinforcement learning, has shown that learning can be substantially accelerated. This paper reports on an experiment to learn a policy for a real-world task directly from human interaction using rewards provided by users. It shows that a usable policy can be learnt in just a few hundred dialogues without needing a user simulator and, using a learning strategy that reduces the risk of taking bad actions. The paper also investigates adaptation behaviour when the system continues learning for several thousand dialogues and highlights the need for robustness to noisy rewards.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127943814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 83
Multi-level context-dependent acoustic modeling for automatic speech recognition 用于自动语音识别的多级上下文相关声学建模
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163911
Hung-An Chang, James R. Glass
{"title":"Multi-level context-dependent acoustic modeling for automatic speech recognition","authors":"Hung-An Chang, James R. Glass","doi":"10.1109/ASRU.2011.6163911","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163911","url":null,"abstract":"In this paper, we propose a multi-level, context-dependent acoustic modeling framework for automatic speech recognition. For each context-dependent unit considered by the recognizer, we construct a set of classifiers that target different amounts of contextual resolution, and then combine them for scoring. Since information from multiple levels of contexts is appropriately combined, the proposed modeling framework provides reasonable scores for units with few or no training examples, while maintaining an ability to distinguish between different context-dependent units. On a large vocabulary lecture transcription task, the proposed modeling framework outperforms a traditional clustering-based context-dependent acoustic model by 3.5% (11.4% relative) in terms of word error rate.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125451630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A convergence analysis of log-linear training and its application to speech recognition 对数线性训练的收敛性分析及其在语音识别中的应用
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163895
Simon Wiesler, R. Schlüter, H. Ney
{"title":"A convergence analysis of log-linear training and its application to speech recognition","authors":"Simon Wiesler, R. Schlüter, H. Ney","doi":"10.1109/ASRU.2011.6163895","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163895","url":null,"abstract":"Log-linear models are a promising approach for speech recognition. Typically, log-linear models are trained according to a strictly convex criterion. Optimization algorithms are guaranteed to converge to the unique global optimum of the objective function from any initialization. For large-scale applications, considerations in the limit of infinite iterations are not sufficient. We show that log-linear training can be a highly ill-conditioned optimization problem, resulting in extremely slow convergence. Conversely, the optimization problem can be preconditioned by feature transformations. Making use of our convergence analysis, we improve our log-linear speech recognition system and achieve a strong reduction of its training time. In addition, we validate our analysis on a continuous handwriting recognition task.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123192880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription 会话语音转录中上下文相关深度神经网络的特征工程
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163899
F. Seide, Gang Li, Xie Chen, Dong Yu
{"title":"Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription","authors":"F. Seide, Gang Li, Xie Chen, Dong Yu","doi":"10.1109/ASRU.2011.6163899","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163899","url":null,"abstract":"We investigate the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, from a feature-engineering perspective. Recently, we had shown that for speaker-independent transcription of phone calls (NIST RT03S Fisher data), CD-DNN-HMMs reduced the word error rate by as much as one third—from 27.4%, obtained by discriminatively trained Gaussian-mixture HMMs with HLDA features, to 18.5%—using 300+ hours of training data (Switchboard), 9000+ tied triphone states, and up to 9 hidden network layers.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123208043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 690
A Trajectory-based Parallel Model Combination with a unified static and dynamic parameter compensation for noisy speech recognition 基于轨迹的并行模型与统一的静态和动态参数补偿相结合用于噪声语音识别
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163914
K. Sim, Minh-Thang Luong
{"title":"A Trajectory-based Parallel Model Combination with a unified static and dynamic parameter compensation for noisy speech recognition","authors":"K. Sim, Minh-Thang Luong","doi":"10.1109/ASRU.2011.6163914","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163914","url":null,"abstract":"Parallel Model Combination (PMC) is widely used as a technique to compensate Gaussian parameters of a clean speech model for noisy speech recognition. The basic principle of PMC uses a log normal approximation to transform statistics of the data distribution between the cepstral domain and the linear spectral domain. Typically, further approximations are needed to compensate the dynamic parameters separately. In this paper, Trajectory PMC (TPMC) is proposed to compensate both the static and dynamic parameters. TPMC uses the explicit relationships between the static and dynamic features to transform the static and dynamic parameters into a sequence (trajectory) of static parameters, so that the log normal approximation can be applied. Experimental results on WSJCAM0 database corrupted with additive babble noise reveals that the proposed TPMC method gives promising improvements over PMC and VTS.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126611244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
iVector-based discriminative adaptation for automatic speech recognition 基于向量的判别自适应自动语音识别
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163922
M. Karafiát, L. Burget, P. Matejka, O. Glembek, J. Černocký
{"title":"iVector-based discriminative adaptation for automatic speech recognition","authors":"M. Karafiát, L. Burget, P. Matejka, O. Glembek, J. Černocký","doi":"10.1109/ASRU.2011.6163922","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163922","url":null,"abstract":"We presented a novel technique for discriminative feature-level adaptation of automatic speech recognition system. The concept of iVectors popular in Speaker Recognition is used to extract information about speaker or acoustic environment from speech segment. iVector is a low-dimensional fixed-length representing such information. To utilized iVectors for adaptation, Region Dependent Linear Transforms (RDLT) are discriminatively trained using MPE criterion on large amount of annotated data to extract the relevant information from iVectors and to compensate speech feature. The approach was tested on standard CTS data. We found it to be complementary to common adaptation techniques. On a well tuned RDLT system with standard CMLLR adaptation we reached 0.8% additive absolute WER improvement.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126867772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 92
Extending noise robust structured support vector machines to larger vocabulary tasks 将噪声鲁棒结构化支持向量机扩展到更大的词汇量任务
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163898
Shi-Xiong Zhang, M. Gales
{"title":"Extending noise robust structured support vector machines to larger vocabulary tasks","authors":"Shi-Xiong Zhang, M. Gales","doi":"10.1109/ASRU.2011.6163898","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163898","url":null,"abstract":"This paper describes a structured SVM framework suitable for noise-robust medium/large vocabulary speech recognition. Several theoretical and practical extensions to previous work on small vocabulary tasks are detailed. The joint feature space based on word models is extended to allow context-dependent triphone models to be used. By interpreting the structured SVM as a large margin log-linear model, illustrates that there is an implicit assumption that the prior of the discriminative parameter is a zero mean Gaussian. However, depending on the definition of likelihood feature space, a non-zero prior may be more appropriate. A general Gaussian prior is incorporated into the large margin training criterion in a form that allows the cutting plan algorithm to be directly applied. To further speed up the training process, 1-slack algorithm, caching competing hypothesis and parallelization strategies are also proposed. The performance of structured SVMs is evaluated on noise corrupted medium vocabulary speech recognition task: AURORA 4.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121686093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Subword-based automatic lexicon learning for Speech Recognition 基于子词的语音识别自动词汇学习
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163938
Timo Mertens, S. Seneff
{"title":"Subword-based automatic lexicon learning for Speech Recognition","authors":"Timo Mertens, S. Seneff","doi":"10.1109/ASRU.2011.6163938","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163938","url":null,"abstract":"We present a framework for learning a pronunciation lexicon for an Automatic Speech Recognition (ASR) system from multiple utterances of the same training words, where the lexical identities of the words are unknown. Instead of only trying to learn pronunciations for known words we go one step further and try to learn both spelling and pronunciation in a joint optimization. Decoding based on linguistically motivated hybrid subword units generates the joint lexical search space, which is reduced to the most appropriate lexical entries based on a set of simple pruning techniques. A cascade of letter and acoustic pruning, followed by re-scoring N-best hypotheses with discriminative decoder statistics resulted in optimal lexical entries in terms of both spelling and pronunciation. Evaluating the framework on English isolated word recognition, we achieve reductions of 7.7% absolute on word error rate and 20.9% absolute on character error rate over baselines that use no pruning.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131590643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信