2009 IEEE Workshop on Automatic Speech Recognition & Understanding最新文献_第10页

Lattice-based lexical cues for word fragment detection in conversational speech 会话语音中基于格的词片段检测

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373419

Kartik Audhkhasi, P. Georgiou, Shrikanth S. Narayanan

引用次数: 1

Back-off action selection in summary space-based POMDP dialogue systems 基于空间的POMDP对话系统中的后退操作选择

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373416

Milica Gasic, F. Lefèvre, Filip Jurcícek, Simon Keizer, François Mairesse, Blaise Thomson, Kai Yu, S. Young

引用次数: 15

Robust speech recognition using a Small Power Boosting algorithm 基于小功率增强算法的鲁棒语音识别

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373230

Chanwoo Kim, Kshitiz Kumar, R. Stern

引用次数: 23

Dynamic network decoding revisited 重新访问动态网络解码

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5372904

H. Soltau, G. Saon

引用次数: 50

Support vector machines for noise robust ASR 支持向量机用于噪声鲁棒ASR

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5372913

M. Gales, A. Ragni, H. AlDamarki, C. Gautier

引用次数: 31

Improved vocabulary independent search with approximate match based on Conditional Random Fields 改进的基于条件随机场近似匹配的词汇独立搜索

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373323

U. Chaudhari, M. Picheny

{"title":"Improved vocabulary independent search with approximate match based on Conditional Random Fields","authors":"U. Chaudhari, M. Picheny","doi":"10.1109/ASRU.2009.5373323","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373323","url":null,"abstract":"We investigate the use of Conditional Random Fields (CRF) to model confusions and account for errors in the phonetic decoding derived from Automatic Speech Recognition output. The goal is to improve the accuracy of approximate phonetic match, given query terms and an indexed database of documents, in a vocabulary independent audio search system. Audio data is ingested, segmented, decoded to produce a sequence of phones, and subsequently indexed using phone N-grams. Search is performed by expanding queries into phone sequences and matching against the index. The approximate match score is derived from a CRF, trained on parallel transcripts, which provides a general framework for modeling the errors that a recognition system may make taking contextual effects into consideration. Our approach differs from other work in the field in that we focus on using CRFs to model context dependent phone level confusions, rather than on explicitly modeling parameters of an edit distance. While, the results we obtain on both in and out of vocabulary (OOV) search tasks improve on previous work which incorporated high order phone confusions, the gains for OOV are more impressive.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129226801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

MLP based hierarchical system for task adaptation in ASR 基于MLP的ASR任务自适应分层系统

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373383

Joel Pinto, M. Magimai.-Doss, H. Bourlard

引用次数: 15

Scaling shrinkage-based language models 缩放基于收缩的语言模型

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373380

Stanley F. Chen, L. Mangu, B. Ramabhadran, R. Sarikaya, A. Sethy

{"title":"Scaling shrinkage-based language models","authors":"Stanley F. Chen, L. Mangu, B. Ramabhadran, R. Sarikaya, A. Sethy","doi":"10.1109/ASRU.2009.5373380","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373380","url":null,"abstract":"In [1], we show that a novel class-based language model, Model M, and the method of regularized minimum discrimination information (rMDI) models outperform comparable methods on moderate amounts of Wall Street Journal data. Both of these methods are motivated by the observation that shrinking the sum of parameter magnitudes in an exponential language model tends to improve performance [2]. In this paper, we investigate whether these shrinkage-based techniques also perform well on larger training sets and on other domains. First, we explain why good performance on large data sets is uncertain, by showing that gains relative to a baseline n-gram model tend to decrease as training set size increases. Next, we evaluate several methods for data/model combination with Model M and rMDI models on limited-scale domains, to uncover which techniques should work best on large domains. Finally, we apply these methods on a variety of medium-to-large-scale domains covering several languages, and show that Model M consistently provides significant gains over existing language models for state-of-the-art systems in both speech recognition and machine translation.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114903338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Weighted finite state transducer based statistical dialog management 基于加权有限状态传感器的统计对话框管理

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 1900-01-01 DOI: 10.1109/ASRU.2009.5373350

Chiori Hori, Kiyonori Ohtake, Teruhisa Misu, H. Kashioka, Satoshi Nakamura

{"title":"Weighted finite state transducer based statistical dialog management","authors":"Chiori Hori, Kiyonori Ohtake, Teruhisa Misu, H. Kashioka, Satoshi Nakamura","doi":"10.1109/ASRU.2009.5373350","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373350","url":null,"abstract":"We proposed a dialog system using a weighted finite-state transducer (WFST) in which user concept and system action tags are input and output of the transducer, respectively. The WFST-based platform for dialog management enables us to combine various statistical models for dialog management (DM), user input understanding and system action generation, and then search the best system action in response to user inputs among multiple hypotheses. To test the potential of the WFST-based DM platform using statistical models, we constructed a dialog system using a human-to-human spoken dialog corpus for hotel reservation, which is annotated with Interchange Format (IF). A scenario WFST and a spoken language understanding (SLU) WFST were obtained from the corpus and then composed together and optimized. We evaluated the detection accuracy of the system next action tags using Mean Reciprocal Ranking (MRR). Finally, we constructed a full WFST-based dialog system by composing SLU, scenario and sentence generation (SG) WFSTs. Humans read the system responses in natural language and judged the quality of the responses. We confirmed that the WFST-based DM platform was capable of handling various spoken language and scenarios when the user concept and system action tags are consistent and distinguishable.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132812750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Generalized likelihood ratio discriminant analysis 广义似然比判别分析

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 1900-01-01 DOI: 10.1109/ASRU.2009.5373395

Muhammad Ali Tahir, G. Heigold, Christian Plahl, R. Schlüter, H. Ney

{"title":"Generalized likelihood ratio discriminant analysis","authors":"Muhammad Ali Tahir, G. Heigold, Christian Plahl, R. Schlüter, H. Ney","doi":"10.1109/ASRU.2009.5373395","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373395","url":null,"abstract":"In the past several decades, classifier-independent front-end feature extraction, where the derivation of acoustic features is lightly associated with the back-end model training or classification, has been prominently used in various pattern recognition tasks, including automatic speech recognition (ASR). In this paper, we present a novel discriminative feature transformation, named generalized likelihood ratio discriminant analysis (GLRDA), on the basis of the likelihood ratio test (LRT). It attempts to seek a lower dimensional feature subspace by making the most confusing situation, described by the null hypothesis, as unlikely to happen as possible without the homoscedastic assumption on class distributions. We also show that the classical linear discriminant analysis (LDA) and its well-known extension - heteroscedastic linear discriminant analysis (HLDA) can be regarded as two special cases of our proposed method. The empirical class confusion information can be further incorporated into GLRDA for better recognition performance. Experimental results demonstrate that GLRDA and its variant can yield moderate performance improvements over HLDA and LDA for the large vocabulary continuous speech recognition (LVCSR) task.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129759600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1