2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)最新文献

筛选
英文 中文
Robust speech recognition by properly utilizing reliable frames and segments in corrupted signals 通过在损坏信号中适当地利用可靠的帧和段来实现鲁棒语音识别
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430091
Yi Chen, C. Wan, Lin-Shan Lee
{"title":"Robust speech recognition by properly utilizing reliable frames and segments in corrupted signals","authors":"Yi Chen, C. Wan, Lin-Shan Lee","doi":"10.1109/ASRU.2007.4430091","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430091","url":null,"abstract":"In this paper, we propose a new approach to detecting and utilizing reliable frames and segments in corrupted signals for robust speech recognition. Novel approaches to estimating an energy-based measure and a harmonicity measure for each frame are developed. SNR-dependent GMM classifiers are then trained, together with a reliable frame selection and clustering module and a reliable segment identification module, to detect the most reliable frames in an utterance. These reliable frames and segments thus obtained can be properly used in both front-end feature enhancement and back-end Viterbi decoding. In the extensive experiments reported here, very significant improvements in recognition accuracies were obtained with the proposed approaches for all types of noise and all SNR values defined in the Aurora 2 database.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128143703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Building a highly accurate Mandarin speech recognizer 构建高精度的普通话语音识别器
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430161
M. Hwang, Gang Peng, Wen Wang, Arlo Faria, A. Heidel, Mari Ostendorf
{"title":"Building a highly accurate Mandarin speech recognizer","authors":"M. Hwang, Gang Peng, Wen Wang, Arlo Faria, A. Heidel, Mari Ostendorf","doi":"10.1109/ASRU.2007.4430161","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430161","url":null,"abstract":"We describe a highly accurate large-vocabulary continuous Mandarin speech recognizer, a collaborative effort among four research organizations. Particularly, we build two acoustic models (AMs) with significant differences but similar accuracy for the purposes of cross adaptation and system combination. This paper elaborates on the main differences between the two systems, where one recognizer incorporates a discriminatively trained feature while the other utilizes a discriminative feature transformation. Additionally we present an improved acoustic segmentation algorithm and topic-based language model (LM) adaptation. Coupled with increased acoustic training data, we reduced the character error rate (CER) of the DARPA GALE 2006 evaluation set to 15.3% from 18.4%.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115751332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Minimum mutual information beamforming for simultaneous active speakers 同时有源说话者的最小互信息波束形成
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430086
K. Kumatani, U. Mayer, Tobias Gehrig, Emilian Stoimenov, J. McDonough, Matthias Wölfel
{"title":"Minimum mutual information beamforming for simultaneous active speakers","authors":"K. Kumatani, U. Mayer, Tobias Gehrig, Emilian Stoimenov, J. McDonough, Matthias Wölfel","doi":"10.1109/ASRU.2007.4430086","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430086","url":null,"abstract":"In this work, we address an acoustic beamforming application where two speakers are simultaneously active. We construct one subband domain beamformer in generalized sidelobe canceller (GSC) configuration for each source. In contrast to normal practice, we then jointly adjust the active weight vectors of both GSCs to obtain two output signals with minimum mutual information (MMI). In order to calculate the mutual information of the complex subband snapshots, we consider four probability density functions (pdfs), namely the Gaussian, Laplace, K0 and lceil pdfs. The latter three belong to the class of super-Gaussian density functions that are typically used in independent component analysis as opposed to conventional beam-forming. We demonstrate the effectiveness of our proposed technique through a series of far-field automatic speech recognition experiments on data from the PASCAL Speech Separation Challenge. In the experiments, the delay-and-sum beamformer achieved a word error rate (WER) of 70.4 %. The MMI beamformer under a Gaussian assumption achieved 55.2 % WER which was further reduced to 52.0 % with a K0 pdf, whereas the WER for data recorded with close-talking microphone was 21.6 %.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129399289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Development of the 2007 RWTH Mandarin LVCSR system 2007年工业大学文华LVCSR系统的发展
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430155
Björn Hoffmeister, Christian Plahl, P. Fritz, G. Heigold, J. Lööf, R. Schlüter, H. Ney
{"title":"Development of the 2007 RWTH Mandarin LVCSR system","authors":"Björn Hoffmeister, Christian Plahl, P. Fritz, G. Heigold, J. Lööf, R. Schlüter, H. Ney","doi":"10.1109/ASRU.2007.4430155","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430155","url":null,"abstract":"This paper describes the development of the RWTH Mandarin LVCSR system. Different acoustic front-ends together with multiple system cross-adaptation are used in a two stage decoding framework. We describe the system in detail and present systematic recognition results. Especially, we compare a variety of approaches for cross-adapting to multiple systems. During the development we did a comparative study on different methods for integrating tone and phoneme posterior features. Furthermore, we apply lattice based consensus decoding and system combination methods. In these methods, the effect of minimizing character instead of word errors is compared. The final system obtains a character error rate of 17.7% on the GALE 2006 evaluation data.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132294726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Adapting grapheme-to-phoneme conversion for name recognition 适应字素到音素的名称识别转换
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430097
Xiao Li, A. Gunawardana, A. Acero
{"title":"Adapting grapheme-to-phoneme conversion for name recognition","authors":"Xiao Li, A. Gunawardana, A. Acero","doi":"10.1109/ASRU.2007.4430097","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430097","url":null,"abstract":"This work investigates the use of acoustic data to improve grapheme-to-phoneme conversion for name recognition. We introduce a joint model of acoustics and graphonemes, and present two approaches, maximum likelihood training and discriminative training, in adapting graphoneme model parameters. Experiments on a large-scale voice-dialing system show that the maximum likelihood approach yields a relative 7% reduction in SER compared to the best baseline result we obtained without leveraging acoustic data, while discriminative training enlarges the SER reduction to 12%.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115679615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Interpolative variable frame rate transmission of speech features for distributed speech recognition 分布式语音识别中语音特征的插值变帧率传输
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430179
Huiqun Deng, D. O'Shaughnessy, Jean-Guy Dahan, W. Ganong
{"title":"Interpolative variable frame rate transmission of speech features for distributed speech recognition","authors":"Huiqun Deng, D. O'Shaughnessy, Jean-Guy Dahan, W. Ganong","doi":"10.1109/ASRU.2007.4430179","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430179","url":null,"abstract":"In distributed speech recognition, vector quantization is used to reduce the number of bits for coding speech features at the user end in order to save energy for transmitting speech feature streams to remote recognizers and reduce data traffic congestion. We notice that the overall bit rate of the transmitted feature streams could be further reduced by not sending redundant frames that can be interpolated at the remote server from received frames. Interpolation introduces errors and may degrade speech recognition. This paper investigates the methods of selecting frames for transmission and the effect of interpolation on recognition. Experiments on a large vocabulary recognizer show that with spline interpolation, the overall frame rate for transmission can be reduced by about 50% with a relative increase in word error rate less than 5.2% for clean and noisy speech.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124205742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Recognition and understanding of meetings the AMI and AMIDA projects 对AMI和AMIDA项目会议的认识和理解
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430116
S. Renals, Thomas Hain, H. Bourlard
{"title":"Recognition and understanding of meetings the AMI and AMIDA projects","authors":"S. Renals, Thomas Hain, H. Bourlard","doi":"10.1109/ASRU.2007.4430116","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430116","url":null,"abstract":"The AMI and AMIDA projects are concerned with the recognition and interpretation of multiparty meetings. Within these projects we have: developed an infrastructure for recording meetings using multiple microphones and cameras; released a 100 hour annotated corpus of meetings; developed techniques for the recognition and interpretation of meetings based primarily on speech recognition and computer vision; and developed an evaluation framework at both component and system levels. In this paper we present an overview of these projects, with an emphasis on speech recognition and content extraction.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115067448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 144
Voice/audio information retrieval: minimizing the need for human ears 语音/音频信息检索:尽量减少对人耳的需求
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430183
M. Clements, M. Gavaldà
{"title":"Voice/audio information retrieval: minimizing the need for human ears","authors":"M. Clements, M. Gavaldà","doi":"10.1109/ASRU.2007.4430183","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430183","url":null,"abstract":"This paper discusses the challenges of building information retrieval applications that operate on large amounts of voice/audio data. Various problems and issues are presented along with proposed solutions. A set of techniques based on a phonetic keyword spotting approach is presented, together with examples of concrete applications that solve real-life problems.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117317061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Unsupervised state clustering for stochastic dialog management 随机对话管理的无监督状态聚类
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430171
F. Lefèvre, R. Mori
{"title":"Unsupervised state clustering for stochastic dialog management","authors":"F. Lefèvre, R. Mori","doi":"10.1109/ASRU.2007.4430171","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430171","url":null,"abstract":"Following recent studies in stochastic dialog management, this paper introduces an unsupervised approach aiming at reducing the cost and complexity for the setup of a probabilistic POMDP-based dialog manager. The proposed method is based on a first decoding step deriving semantic basic constituents from user utterances. These isolated units and some relevant context features (as previous system actions, previous user utterances...) are combined to form vectors representing the on-going dialog states. After a clustering step, each partition of this space is intented to represent a particular dialog state. Then any new utterance can be classified according to these automatic states and the belief state can be updated before the POMDP-based dialog manager can take a decision on the best next action to perform. The proposed approach is applied to the French media task (tourist information and hotel booking). The media 10k-utterance training corpus is semantically rich (over 80 basic concepts) and is segmentally annotated in terms of basic concepts. Before user trials can be carried out, some insights on the method effectiveness are obtained by analysis of the convergence of the POMDP models.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116309298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Speech enhancement using PCA and variance of the reconstruction error in distributed speech recognition 分布式语音识别中基于PCA和重构误差方差的语音增强
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430077
Amin Haji Abolhassani, S. Selouani, D. O'Shaughnessy
{"title":"Speech enhancement using PCA and variance of the reconstruction error in distributed speech recognition","authors":"Amin Haji Abolhassani, S. Selouani, D. O'Shaughnessy","doi":"10.1109/ASRU.2007.4430077","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430077","url":null,"abstract":"We present in this paper a signal subspace-based approach for enhancing a noisy signal. This algorithm is based on a principal component analysis (PCA) in which the optimal sub-space selection is provided by a variance of the reconstruction error (VRE) criterion. This choice overcomes many limitations encountered with other selection criteria, like over-estimation of the signal subspace or the need for empirical parameters. We have also extended our subspace algorithm to take into account the case of colored and babble noise. The performance evaluation, which is made on the Aurora database, measures improvements in the distributed speech recognition of noisy signals corrupted by different types of additive noises. Our algorithm succeeds in improving the recognition of noisy speech in all noisy conditions.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"163 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123422421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信