2009 IEEE Workshop on Automatic Speech Recognition & Understanding最新文献

筛选
英文 中文
Lattice-based lexical cues for word fragment detection in conversational speech 会话语音中基于格的词片段检测
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373419
Kartik Audhkhasi, P. Georgiou, Shrikanth S. Narayanan
{"title":"Lattice-based lexical cues for word fragment detection in conversational speech","authors":"Kartik Audhkhasi, P. Georgiou, Shrikanth S. Narayanan","doi":"10.1109/ASRU.2009.5373419","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373419","url":null,"abstract":"Previous approaches to the problem of word fragment detection in speech have focussed primarily on acoustic-prosodic features [1], [2]. This paper proposes that the output of a continuous Automatic Speech Recognition (ASR) system can also be used to derive robust lexical features for the task. We hypothesize that the confusion in the word lattice generated by the ASR system can be exploited for detecting word fragments. Two sets of lexical features are proposed -one which is based on the word confusion, and the other based on the pronunciation confusion between the word hypotheses in the lattice. Classification experiments with a Support Vector Machine (SVM) classifier show that these lexical features perform better than the previously proposed acoustic-prosodic features by around 5.20% (relative) on a corpus chosen from the DARPA Transtac Iraqi-English (San Diego) corpus [3]. A combination of both these feature sets improves the word fragment detection accuracy by 11.50% relative to using just the acoustic-prosodic features.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123010362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Back-off action selection in summary space-based POMDP dialogue systems 基于空间的POMDP对话系统中的后退操作选择
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373416
Milica Gasic, F. Lefèvre, Filip Jurcícek, Simon Keizer, François Mairesse, Blaise Thomson, Kai Yu, S. Young
{"title":"Back-off action selection in summary space-based POMDP dialogue systems","authors":"Milica Gasic, F. Lefèvre, Filip Jurcícek, Simon Keizer, François Mairesse, Blaise Thomson, Kai Yu, S. Young","doi":"10.1109/ASRU.2009.5373416","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373416","url":null,"abstract":"This paper deals with the issue of invalid state-action pairs in the Partially Observable Markov Decision Process (POMDP) framework, with a focus on real-world tasks where the need for approximate solutions exacerbates this problem. In particular, when modelling dialogue as a POMDP, both the state and the action space must be reduced to smaller scale summary spaces in order to make learning tractable. However, since not all actions are valid in all states, the action proposed by the policy in summary space sometimes leads to an invalid action when mapped back to master space. Some form of back-off scheme must then be used to generate an alternative action. This paper demonstrates how the value function derived during reinforcement learning can be used to order back-off actions in an N-best list. Compared to a simple baseline back-off strategy and to a strategy that extends the summary space to minimise the occurrence of invalid actions, the proposed N-best action selection scheme is shown to be significantly more robust.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122113575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Robust speech recognition using a Small Power Boosting algorithm 基于小功率增强算法的鲁棒语音识别
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373230
Chanwoo Kim, Kshitiz Kumar, R. Stern
{"title":"Robust speech recognition using a Small Power Boosting algorithm","authors":"Chanwoo Kim, Kshitiz Kumar, R. Stern","doi":"10.1109/ASRU.2009.5373230","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373230","url":null,"abstract":"In this paper, we present a noise robustness algorithm called Small Power Boosting (SPB). We observe that in the spectral domain, time-frequency bins with smaller power are more affected by additive noise. The conventional way of handling this problem is estimating the noise from the test utterance and doing normalization or subtraction. In our work, in contrast, we intentionally boost the power of time-frequency bins with small energy for both the training and testing datasets. Since time-frequency bins with small power no longer exist after this power boosting, the spectral distortion between the clean and corrupt test sets becomes reduced. This type of small power boosting is also highly related to physiological nonlinearity. We observe that when small power boosting is done, suitable weighting smoothing becomes highly important. Our experimental results indicate that this simple idea is very helpful for very difficult noisy environments such as corruption by background music.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128608795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Dynamic network decoding revisited 重新访问动态网络解码
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5372904
H. Soltau, G. Saon
{"title":"Dynamic network decoding revisited","authors":"H. Soltau, G. Saon","doi":"10.1109/ASRU.2009.5372904","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5372904","url":null,"abstract":"We present a dynamic network decoder capable of using large cross-word context models and large n-gram histories. Our method for constructing the search network is designed to process large cross-word context models very efficiently and we address the optimization of the search network to minimize any overhead during run-time for the dynamic network decoder. The search procedure uses the full LM history for lookahead, and path recombination is done as early as possible. In our systematic comparison to a static FSM based decoder, we find the dynamic decoder can run at comparable speed as the static decoder when large language models are used, while the static decoder performs best for small language models. We discuss the use of very large vocabularies of up to 2.5 million words for both decoding approaches and analyze the effect of weak acoustic models for pruning.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130607235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Support vector machines for noise robust ASR 支持向量机用于噪声鲁棒ASR
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5372913
M. Gales, A. Ragni, H. AlDamarki, C. Gautier
{"title":"Support vector machines for noise robust ASR","authors":"M. Gales, A. Ragni, H. AlDamarki, C. Gautier","doi":"10.1109/ASRU.2009.5372913","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5372913","url":null,"abstract":"Using discriminative classifiers, such as Support Vector Machines (SVMs) in combination with, or as an alternative to, Hidden Markov Models (HMMs) has a number of advantages for difficult speech recognition tasks. For example, the models can make use of additional dependencies in the observation sequences than HMMs provided the appropriate form of kernel is used. However standard SVMs are binary classifiers, and speech is a multi-class problem. Furthermore, to train SVMs to distinguish word pairs requires that each word appears in the training data. This paper examines both of these limitations. Tree-based reduction approaches for multiclass classification are described, as well as some of the issues in applying them to dynamic data, such as speech. To address the training data issues, a simplified version of HMM-based synthesis can be used, which allows data for any word-pair to be generated. These approaches are evaluated on two noise corrupted digit sequence tasks: AURORA 2.0; and actual in-car collected data.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128691572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Improved vocabulary independent search with approximate match based on Conditional Random Fields 改进的基于条件随机场近似匹配的词汇独立搜索
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373323
U. Chaudhari, M. Picheny
{"title":"Improved vocabulary independent search with approximate match based on Conditional Random Fields","authors":"U. Chaudhari, M. Picheny","doi":"10.1109/ASRU.2009.5373323","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373323","url":null,"abstract":"We investigate the use of Conditional Random Fields (CRF) to model confusions and account for errors in the phonetic decoding derived from Automatic Speech Recognition output. The goal is to improve the accuracy of approximate phonetic match, given query terms and an indexed database of documents, in a vocabulary independent audio search system. Audio data is ingested, segmented, decoded to produce a sequence of phones, and subsequently indexed using phone N-grams. Search is performed by expanding queries into phone sequences and matching against the index. The approximate match score is derived from a CRF, trained on parallel transcripts, which provides a general framework for modeling the errors that a recognition system may make taking contextual effects into consideration. Our approach differs from other work in the field in that we focus on using CRFs to model context dependent phone level confusions, rather than on explicitly modeling parameters of an edit distance. While, the results we obtain on both in and out of vocabulary (OOV) search tasks improve on previous work which incorporated high order phone confusions, the gains for OOV are more impressive.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129226801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
MLP based hierarchical system for task adaptation in ASR 基于MLP的ASR任务自适应分层系统
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373383
Joel Pinto, M. Magimai.-Doss, H. Bourlard
{"title":"MLP based hierarchical system for task adaptation in ASR","authors":"Joel Pinto, M. Magimai.-Doss, H. Bourlard","doi":"10.1109/ASRU.2009.5373383","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373383","url":null,"abstract":"We investigate a multilayer perceptron (MLP) based hierarchical approach for task adaptation in automatic speech recognition. The system consists of two MLP classifiers in tandem. A well-trained MLP available off-the-shelf is used at the first stage of the hierarchy. A second MLP is trained on the posterior features estimated by the first, but with a long temporal context of around 130 ms. By using an MLP trained on 232 hours of conversational telephone speech, the hierarchical adaptation approach yields a word error rate of 1.8% on the 600-word Phonebook isolated word recognition task. This compares favorably to the error rate of 4% obtained by the conventional single MLP based system trained with the same amount of Phonebook data that is used for adaptation. The proposed adaptation scheme also benefits from the ability of the second MLP to model the temporal information in the posterior features.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129260262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Scaling shrinkage-based language models 缩放基于收缩的语言模型
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373380
Stanley F. Chen, L. Mangu, B. Ramabhadran, R. Sarikaya, A. Sethy
{"title":"Scaling shrinkage-based language models","authors":"Stanley F. Chen, L. Mangu, B. Ramabhadran, R. Sarikaya, A. Sethy","doi":"10.1109/ASRU.2009.5373380","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373380","url":null,"abstract":"In [1], we show that a novel class-based language model, Model M, and the method of regularized minimum discrimination information (rMDI) models outperform comparable methods on moderate amounts of Wall Street Journal data. Both of these methods are motivated by the observation that shrinking the sum of parameter magnitudes in an exponential language model tends to improve performance [2]. In this paper, we investigate whether these shrinkage-based techniques also perform well on larger training sets and on other domains. First, we explain why good performance on large data sets is uncertain, by showing that gains relative to a baseline n-gram model tend to decrease as training set size increases. Next, we evaluate several methods for data/model combination with Model M and rMDI models on limited-scale domains, to uncover which techniques should work best on large domains. Finally, we apply these methods on a variety of medium-to-large-scale domains covering several languages, and show that Model M consistently provides significant gains over existing language models for state-of-the-art systems in both speech recognition and machine translation.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114903338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Weighted finite state transducer based statistical dialog management 基于加权有限状态传感器的统计对话框管理
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 1900-01-01 DOI: 10.1109/ASRU.2009.5373350
Chiori Hori, Kiyonori Ohtake, Teruhisa Misu, H. Kashioka, Satoshi Nakamura
{"title":"Weighted finite state transducer based statistical dialog management","authors":"Chiori Hori, Kiyonori Ohtake, Teruhisa Misu, H. Kashioka, Satoshi Nakamura","doi":"10.1109/ASRU.2009.5373350","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373350","url":null,"abstract":"We proposed a dialog system using a weighted finite-state transducer (WFST) in which user concept and system action tags are input and output of the transducer, respectively. The WFST-based platform for dialog management enables us to combine various statistical models for dialog management (DM), user input understanding and system action generation, and then search the best system action in response to user inputs among multiple hypotheses. To test the potential of the WFST-based DM platform using statistical models, we constructed a dialog system using a human-to-human spoken dialog corpus for hotel reservation, which is annotated with Interchange Format (IF). A scenario WFST and a spoken language understanding (SLU) WFST were obtained from the corpus and then composed together and optimized. We evaluated the detection accuracy of the system next action tags using Mean Reciprocal Ranking (MRR). Finally, we constructed a full WFST-based dialog system by composing SLU, scenario and sentence generation (SG) WFSTs. Humans read the system responses in natural language and judged the quality of the responses. We confirmed that the WFST-based DM platform was capable of handling various spoken language and scenarios when the user concept and system action tags are consistent and distinguishable.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132812750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Generalized likelihood ratio discriminant analysis 广义似然比判别分析
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 1900-01-01 DOI: 10.1109/ASRU.2009.5373395
Muhammad Ali Tahir, G. Heigold, Christian Plahl, R. Schlüter, H. Ney
{"title":"Generalized likelihood ratio discriminant analysis","authors":"Muhammad Ali Tahir, G. Heigold, Christian Plahl, R. Schlüter, H. Ney","doi":"10.1109/ASRU.2009.5373395","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373395","url":null,"abstract":"In the past several decades, classifier-independent front-end feature extraction, where the derivation of acoustic features is lightly associated with the back-end model training or classification, has been prominently used in various pattern recognition tasks, including automatic speech recognition (ASR). In this paper, we present a novel discriminative feature transformation, named generalized likelihood ratio discriminant analysis (GLRDA), on the basis of the likelihood ratio test (LRT). It attempts to seek a lower dimensional feature subspace by making the most confusing situation, described by the null hypothesis, as unlikely to happen as possible without the homoscedastic assumption on class distributions. We also show that the classical linear discriminant analysis (LDA) and its well-known extension - heteroscedastic linear discriminant analysis (HLDA) can be regarded as two special cases of our proposed method. The empirical class confusion information can be further incorporated into GLRDA for better recognition performance. Experimental results demonstrate that GLRDA and its variant can yield moderate performance improvements over HLDA and LDA for the large vocabulary continuous speech recognition (LVCSR) task.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129759600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信