2013 IEEE Workshop on Automatic Speech Recognition and Understanding最新文献

筛选
英文 中文
Search results based N-best hypothesis rescoring with maximum entropy classification 基于n -最优假设评分的最大熵分类搜索结果
2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707767
Fuchun Peng, Scott Roy, B. Shahshahani, F. Beaufays
{"title":"Search results based N-best hypothesis rescoring with maximum entropy classification","authors":"Fuchun Peng, Scott Roy, B. Shahshahani, F. Beaufays","doi":"10.1109/ASRU.2013.6707767","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707767","url":null,"abstract":"We propose a simple yet effective method for improving speech recognition by reranking the N-best speech recognition hypotheses using search results. We model N-best reranking as a binary classification problem and select the hypothesis with the highest classification confidence. We use query-specific features extracted from the search results to encode domain knowledge and use it with a maximum entropy classifier to rescore the N-best list. We show that rescoring even only the top 2 hypotheses, we can obtain a significant 3% absolute sentence accuracy (SACC) improvement over a strong baseline on production traffic from an entertainment domain.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130988861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
The IBM keyword search system for the DARPA RATS program IBM关键字搜索系统为DARPA RATS计划
2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707730
L. Mangu, H. Soltau, H. Kuo, G. Saon
{"title":"The IBM keyword search system for the DARPA RATS program","authors":"L. Mangu, H. Soltau, H. Kuo, G. Saon","doi":"10.1109/ASRU.2013.6707730","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707730","url":null,"abstract":"The paper describes a state-of-the-art keyword search (KWS) system in which significant improvements are obtained by using Convolutional Neural Network acoustic models, a two-step speech segmentation approach and a simplified ASR architecture optimized for KWS. The system described in this paper had the best performance in the 2013 DARPA RATS evaluation for both Levantine and Farsi.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124264167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Learning filter banks within a deep neural network framework 在深度神经网络框架内学习滤波器组
2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707746
Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, B. Ramabhadran
{"title":"Learning filter banks within a deep neural network framework","authors":"Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, B. Ramabhadran","doi":"10.1109/ASRU.2013.6707746","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707746","url":null,"abstract":"Mel-filter banks are commonly used in speech recognition, as they are motivated from theory related to speech production and perception. While features derived from mel-filter banks are quite popular, we argue that this filter bank is not really an appropriate choice as it is not learned for the objective at hand, i.e. speech recognition. In this paper, we explore replacing the filter bank with a filter bank layer that is learned jointly with the rest of a deep neural network. Thus, the filter bank is learned to minimize cross-entropy, which is more closely tied to the speech recognition objective. On a 50-hour English Broadcast News task, we show that we can achieve a 5% relative improvement in word error rate (WER) using the filter bank learning approach, compared to having a fixed set of filters.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114089047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 170
Learning a subword vocabulary based on unigram likelihood 学习基于一元似然的子词词汇
2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707697
Matti Varjokallio, M. Kurimo, Sami Virpioja
{"title":"Learning a subword vocabulary based on unigram likelihood","authors":"Matti Varjokallio, M. Kurimo, Sami Virpioja","doi":"10.1109/ASRU.2013.6707697","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707697","url":null,"abstract":"Using words as vocabulary units for tasks like speech recognition is infeasible for many morphologically rich languages, including Finnish. Thus, subword units are commonly used for language modeling. This work presents a novel algorithm for creating a subword vocabulary, based on the unigram likelihood of a text corpus. The method is evaluated with entropy measure and a Finnish LVCSR task. Unigram entropy of the text corpus is shown to be a good indicator for the quality of higher order n-gram models, also resulting in high speech recognition accuracy.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123224498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Acoustic unit discovery and pronunciation generation from a grapheme-based lexicon 基于字素词典的声学单元发现和发音生成
2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707760
William Hartmann, A. Roy, L. Lamel, J. Gauvain
{"title":"Acoustic unit discovery and pronunciation generation from a grapheme-based lexicon","authors":"William Hartmann, A. Roy, L. Lamel, J. Gauvain","doi":"10.1109/ASRU.2013.6707760","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707760","url":null,"abstract":"We present a framework for discovering acoustic units and generating an associated pronunciation lexicon from an initial grapheme-based recognition system. Our approach consists of two distinct contributions. First, context-dependent grapheme models are clustered using a spectral clustering approach to create a set of phone-like acoustic units. Next, we transform the pronunciation lexicon using a statistical machine translation-based approach. Pronunciation hypotheses generated from a decoding of the training set are used to create a phrase-based translation table. We propose a novel method for scoring the phrase-based rules that significantly improves the output of the transformation process. Results on an English language dataset demonstrate the combined methods provide a 13% relative reduction in word error rate compared to a baseline grapheme-based system. Our approach could potentially be applied to low-resource languages without existing lexicons, such as in the Babel project.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123467963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
The second ‘CHiME’ speech separation and recognition challenge: An overview of challenge systems and outcomes 第二个“CHiME”语音分离和识别挑战:挑战系统和结果概述
2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707723
Emmanuel Vincent, Jon Barker, Shinji Watanabe, Jonathan Le Roux, Francesco Nesta, Marco Matassoni
{"title":"The second ‘CHiME’ speech separation and recognition challenge: An overview of challenge systems and outcomes","authors":"Emmanuel Vincent, Jon Barker, Shinji Watanabe, Jonathan Le Roux, Francesco Nesta, Marco Matassoni","doi":"10.1109/ASRU.2013.6707723","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707723","url":null,"abstract":"Distant-microphone automatic speech recognition (ASR) remains a challenging goal in everyday environments involving multiple background sources and reverberation. This paper reports on the results of the 2nd `CHiME' Challenge, an initiative designed to analyse and evaluate the performance of ASR systems in a real-world domestic environment. We discuss the rationale for the challenge and provide a summary of the datasets, tasks and baseline systems. The paper overviews the systems that were entered for the two challenge tracks: small-vocabulary with moving talker and medium-vocabulary with stationary talker. We present a summary of the challenge findings including novel results produced by challenge system combination. Possible directions for future challenges are discussed.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125086659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 94
Language style and domain adaptation for cross-language SLU porting 跨语言SLU移植的语言风格和领域适应
2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707720
Evgeny A. Stepanov, Ilya Kashkarev, Ali Orkan Bayer, G. Riccardi, Arindam Ghosh
{"title":"Language style and domain adaptation for cross-language SLU porting","authors":"Evgeny A. Stepanov, Ilya Kashkarev, Ali Orkan Bayer, G. Riccardi, Arindam Ghosh","doi":"10.1109/ASRU.2013.6707720","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707720","url":null,"abstract":"Automatic cross-language Spoken Language Understanding porting is plagued by two limitations. First, SLU are usually trained on limited domain corpora. Second, language pair resources (e.g. aligned corpora) are scarce or unmatched in style (e.g. news vs. conversation). We present experiments on automatic style adaptation of the input for the translation systems and their output for SLU. We approach the problem of scarce aligned data by adapting the available parallel data to the target domain using limited in-domain and larger web crawled close-to-domain corpora. SLU performance is optimized by reranking its output with Recurrent Neural Network-based joint language model. We evaluate end-to-end SLU porting on close and distant language pairs: Spanish - Italian and Turkish - Italian; and achieve significant improvements both in translation quality and SLU performance.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"34 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125451802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
A propagation approach to modelling the joint distributions of clean and corrupted speech in the Mel-Cepstral domain 一种在mel -倒谱域模拟干净和损坏语音联合分布的传播方法
2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707726
Ramón Fernández Astudillo
{"title":"A propagation approach to modelling the joint distributions of clean and corrupted speech in the Mel-Cepstral domain","authors":"Ramón Fernández Astudillo","doi":"10.1109/ASRU.2013.6707726","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707726","url":null,"abstract":"This paper presents a closed form solution relating the joint distributions of corrupted and clean speech in the short-time Fourier Transform (STFT) and Mel-Frequency Cepstral Coefficient (MFCC) domains. This makes possible a tighter integration of STFT domain speech enhancement and feature and model-compensation techniques for robust automatic speech recognition. The approach directly utilizes the conventional speech distortion model for STFT speech enhancement, allowing for low cost, single pass, causal implementations. Compared to similar uncertainty propagation approaches, it provides the full joint distribution, rather than just the posterior distribution, which provides additional model compensation possibilities. The method is exemplified by deriving an MMSE-MFCC estimator from the propagated joint distribution. It is shown that similar performance to that of STFT uncertainty propagation (STFT-UP) can be obtained on the AURORA4, while deriving the full joint distribution.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128395459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multi-stream temporally varying weight regression for cross-lingual speech recognition 跨语言语音识别的多流时变权回归
2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707769
Shilin Liu, K. Sim
{"title":"Multi-stream temporally varying weight regression for cross-lingual speech recognition","authors":"Shilin Liu, K. Sim","doi":"10.1109/ASRU.2013.6707769","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707769","url":null,"abstract":"Building a good Automatic Speech Recognition (ASR) system with limited resources is a very challenging task due to the existing many speech variations. Multilingual and cross-lingual speech recognition techniques are commonly used for this task. This paper investigates the recently proposed Temporally Varying Weight Regression (TVWR) method for cross-lingual speech recognition. TVWR uses posterior features to implicitly model the long-term temporal structures in acoustic patterns. By leveraging on the well-trained foreign recognizers, high quality monophone/state posteriors can be easily incorporated into TVWR to boost the ASR performance on low-resource languages. Furthermore, multi-stream TVWR is proposed, where multiple sets of posterior features are used to incorporate richer (temporal and spatial) context information. Finally, a separate state-tying for the TVWR regression parameters is used to better utilize the more reliable posterior features. Experimental results are evaluated for English and Malay speech recognition with limited resources. By using the Czech, Hungarian and Russian posterior features, TVWR was found to consistently outperform the tandem systems trained on the same features.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133916696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Cross-lingual context sharing and parameter-tying for multi-lingual speech recognition 多语言语音识别的跨语言上下文共享和参数关联
2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707717
Aanchan Mohan, R. Rose
{"title":"Cross-lingual context sharing and parameter-tying for multi-lingual speech recognition","authors":"Aanchan Mohan, R. Rose","doi":"10.1109/ASRU.2013.6707717","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707717","url":null,"abstract":"This paper is concerned with the problem of building acoustic models for automatic speech recognition (ASR) using speech data from multiple languages. Techniques for multi-lingual ASR are developed in the context of the subspace Gaussian mixture model (SGMM)[2, 3]. Multi-lingual SGMM based ASR systems have been configured with shared subspace parameters trained from multiple languages but with distinct language dependent phonetic contexts and states[11, 12]. First, an approach for sharing state-level target language and foreign language SGMM parameters is described. Second, semi-tied covariance transformations are applied as an alternative to full-covariance Gaussians to make acoustic model training less sensitive to issues of insufficient training data. These techniques are applied to Hindi and Marathi language data obtained for an agricultural commodities dialog task in multiple Indian languages.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132744127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信