2009 IEEE Workshop on Automatic Speech Recognition & Understanding最新文献

筛选
英文 中文
Detection of OOV words by combining acoustic confidence measures with linguistic features 声学置信度与语言特征相结合的OOV词检测
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-13 DOI: 10.1109/ASRU.2009.5372877
F. Stouten, D. Fohr, I. Illina
{"title":"Detection of OOV words by combining acoustic confidence measures with linguistic features","authors":"F. Stouten, D. Fohr, I. Illina","doi":"10.1109/ASRU.2009.5372877","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5372877","url":null,"abstract":"This paper describes the design of an Out-Of-Vocabulary words (OOV) detector. Such a system is assumed to detect segments that correspond to OOV words (words that are not included in the lexicon) in the output of a LVCSR system. The OOV detector uses acoustic confidence measures that are derived from several systems: a word recognizer constrained by a lexicon, a phone recognizer constrained by a grammar and a phone recognizer without constraints. On top of that it also uses some linguistic features. The experimental results on a French broadcast news transcription task showed that for our approach precision equals recall at 35%.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"114 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120851906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Active learning for rule-based and corpus-based Spoken Language Understanding models 基于规则和基于语料库的口语理解模型的主动学习
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373377
Pierre Gotab, Frédéric Béchet, Géraldine Damnati
{"title":"Active learning for rule-based and corpus-based Spoken Language Understanding models","authors":"Pierre Gotab, Frédéric Béchet, Géraldine Damnati","doi":"10.1109/ASRU.2009.5373377","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373377","url":null,"abstract":"Active learning can be used for the maintenance of a deployed Spoken Dialog System (SDS) that evolves with time and when large collection of dialog traces can be collected on a daily basis. At the Spoken Language Understanding (SLU) level this maintenance process is crucial as a deployed SDS evolves quickly when services are added, modified or dropped. Knowledge-based approaches, based on manually written grammars or inference rules, are often preferred as system designers can modify directly the SLU models in order to take into account such a modification in the service, even if no or very little related data has been collected. However as new examples are added to the annotated corpus, corpus-based methods can then be applied, replacing or in addition to the initial knowledge-based models. This paper describes an active learning scheme, based on an SLU criterion, which is used for automatically updating the SLU models of a deployed SDS. Two kind of SLU models are going to be compared: rule-based ones, used in the deployed system and consisting of several thousands of hand-crafted rules; corpus-based ones, based on the automatic learning of classifiers on an annotated corpus.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127522086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Noise robust model adaptation using linear spline interpolation 基于线性样条插值的噪声鲁棒模型自适应
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373430
K. Kalgaonkar, M. Seltzer, A. Acero
{"title":"Noise robust model adaptation using linear spline interpolation","authors":"K. Kalgaonkar, M. Seltzer, A. Acero","doi":"10.1109/ASRU.2009.5373430","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373430","url":null,"abstract":"This paper presents a novel data-driven technique for performing acoustic model adaptation to noisy environments. In the presence of additive noise, the relationship between log mel spectra of speech, noise and noisy speech is nonlinear. Traditional methods linearize this relationship using the mode of the nonlinearity or use some other approximation. The approach presented in this paper models this nonlinear relationship using linear spline regression. In this method, the set of spline parameters that minimizes the error between the predicted and actual noisy speech features is learned from training data, and used at runtime to adapt clean acoustic model parameters to the current noise conditions. Experiments were performed to evaluate the performance of the system on the Aurora 2 task. Results show that the proposed adaptation algorithm (word accuracy 89.22%) outperforms VTS model adaptation (word accuracy 88.38%).","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125178029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Automatic detection of vowel pronunciation errors using multiple information sources 使用多个信息源自动检测元音发音错误
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/asru.2009.5373335
Joost van Doremalen, C. Cucchiarini, H. Strik
{"title":"Automatic detection of vowel pronunciation errors using multiple information sources","authors":"Joost van Doremalen, C. Cucchiarini, H. Strik","doi":"10.1109/asru.2009.5373335","DOIUrl":"https://doi.org/10.1109/asru.2009.5373335","url":null,"abstract":"Frequent pronunciation errors made by L2 learners of Dutch often concern vowel substitutions. To detect such pronunciation errors, ASR-based confidence measures (CMs) are generally used. In the current paper we compare and combine confidence measures with MFCCs and phonetic features. The results show that the best results are obtained by using MFCCs, then CMs, and finally phonetic features, and that substantial improvements can be obtained by combining different features.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"47 23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123587260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
The Asian network-based speech-to-speech translation system 基于亚洲网络的语音到语音翻译系统
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373353
S. Sakti, Noriyuki Kimura, Michael Paul, Chiori Hori, E. Sumita, Satoshi Nakamura, Jun Park, C. Wutiwiwatchai, Bo Xu, Hammam Riza, K. Arora, C. Luong, Haizhou Li
{"title":"The Asian network-based speech-to-speech translation system","authors":"S. Sakti, Noriyuki Kimura, Michael Paul, Chiori Hori, E. Sumita, Satoshi Nakamura, Jun Park, C. Wutiwiwatchai, Bo Xu, Hammam Riza, K. Arora, C. Luong, Haizhou Li","doi":"10.1109/ASRU.2009.5373353","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373353","url":null,"abstract":"This paper outlines the first Asian network-based speech-to-speech translation system developed by the Asian Speech Translation Advanced Research (A-STAR) consortium. The system was designed to translate common spoken utterances of travel conversations from a certain source language into multiple target languages in order to facilitate multiparty travel conversations between people speaking different Asian languages. Each A-STAR member contributes one or more of the following spoken language technologies: automatic speech recognition, machine translation, and text-to-speech through Web servers. Currently, the system has successfully covered 9 languages— namely, 8 Asian languages (Hindi, Indonesian, Japanese, Korean, Malay, Thai, Vietnamese, Chinese) and additionally, the English language. The system's domain covers about 20,000 travel expressions, including proper nouns that are names of famous places or attractions in Asian countries. In this paper, we discuss the difficulties involved in connecting various different spoken language translation systems through Web servers. We also present speech-translation results on the first A-STAR demo experiments carried out in July 2009.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125316443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Acoustic emotion recognition: A benchmark comparison of performances 声学情感识别:性能的基准比较
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5372886
Björn Schuller, Bogdan Vlasenko, F. Eyben, G. Rigoll, A. Wendemuth
{"title":"Acoustic emotion recognition: A benchmark comparison of performances","authors":"Björn Schuller, Bogdan Vlasenko, F. Eyben, G. Rigoll, A. Wendemuth","doi":"10.1109/ASRU.2009.5372886","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5372886","url":null,"abstract":"In the light of the first challenge on emotion recognition from speech we provide the largest-to-date benchmark comparison under equal conditions on nine standard corpora in the field using the two pre-dominant paradigms: modeling on a frame-level by means of hidden Markov models and supra-segmental modeling by systematic feature brute-forcing. Investigated corpora are the ABC, AVIC, DES, EMO-DB, eNTERFACE, SAL, SmartKom, SUSAS, and VAM databases. To provide better comparability among sets, we additionally cluster each database's emotions into binary valence and arousal discrimination tasks. In the result large differences are found among corpora that mostly stem from naturalistic emotions and spontaneous speech vs. more prototypical events. Further, supra-segmental modeling proves significantly beneficial on average when several classes are addressed at a time.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122951614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 268
Speaker de-identification via voice transformation 通过语音转换去识别说话人
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373356
Qin Jin, Arthur R. Toth, Tanja Schultz, A. Black
{"title":"Speaker de-identification via voice transformation","authors":"Qin Jin, Arthur R. Toth, Tanja Schultz, A. Black","doi":"10.1109/ASRU.2009.5373356","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373356","url":null,"abstract":"It is a common feature of modern automated voice-driven applications and services to record and transmit a user's spoken request. At the same time, several domains and applications may require keeping the content of the user's request confidential and at the same time preserving the speaker's identity. This requires a technology that allows the speaker's voice to be de-identified in the sense that the voice sounds natural and intelligible but does not reveal the identity of the speaker. In this paper we investigate different voice transformation strategies on a large population of speakers to disguise the speakers' identities while preserving the intelligibility of the voices. We apply two automatic speaker identification approaches to verify the success of de-identification with voice transformation, a GMM-based and a Phonetic approach. The evaluation based on the automatic speaker identification systems verifies that the proposed voice transformation technique enables transmission of the content of the users' spoken requests while successfully preserving their identities. Also, the results indicate that different speakers still sound distinct after the transformation. Furthermore, we carried out a human listening test that proved the transformed speech to be both intelligible and securely de-identified, as it hid the identity of the speakers even to listeners who knew the speakers very well.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117128422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Sub-band modulation spectrum compensation for robust speech recognition 鲁棒语音识别的子带调制频谱补偿
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373506
Wen-hsiang Tu, Sheng-Yuan Huang, J. Hung
{"title":"Sub-band modulation spectrum compensation for robust speech recognition","authors":"Wen-hsiang Tu, Sheng-Yuan Huang, J. Hung","doi":"10.1109/ASRU.2009.5373506","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373506","url":null,"abstract":"This paper proposes a novel scheme in performing feature statistics normalization techniques for robust speech recognition. In the proposed approach, the processed temporal-domain feature sequence is first converted into the modulation spectral domain. The magnitude part of the modulation spectrum is decomposed into non-uniform sub-band segments, and then each sub-band segment is individually processed by the well-known normalization methods, like mean normalization (MN), mean and variance normalization (MVN) and histogram equalization (HEQ). Finally, we reconstruct the feature stream with all the modified sub-band magnitude spectral segments and the original phase spectrum using the inverse DFT. With this process, the components that correspond to more important modulation spectral bands in the feature sequence can be processed separately. For the Aurora-2 clean-condition training task, the new proposed sub-band spectral MVN and HEQ provide relative error rate reductions of 18.66% and 23.58% over the conventional temporal MVN and HEQ, respectively.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128680211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Diagonal priors for full covariance speech recognition 用于全协方差语音识别的对角线先验
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373344
P. Bell, Simon King
{"title":"Diagonal priors for full covariance speech recognition","authors":"P. Bell, Simon King","doi":"10.1109/ASRU.2009.5373344","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373344","url":null,"abstract":"We investigate the use of full covariance Gaussians for large-vocabulary speech recognition. The large number of parameters gives high modelling power, but when training data is limited, the standard sample covariance matrix is often poorly conditioned, and has high variance. We explain how these problems may be solved by the use of a diagonal covariance smoothing prior, and relate this to the shrinkage estimator, for which the optimal shrinkage parameter may itself be estimated from the training data. We also compare the use of generatively and discriminatively trained priors. Results are presented on a large vocabulary conversational telephone speech recognition task.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131269495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Optimal quantization and bit allocation for compressing large discriminative feature space transforms 压缩大型判别特征空间变换的最优量化和位分配
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373407
E. Marcheret, V. Goel, P. Olsen
{"title":"Optimal quantization and bit allocation for compressing large discriminative feature space transforms","authors":"E. Marcheret, V. Goel, P. Olsen","doi":"10.1109/ASRU.2009.5373407","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373407","url":null,"abstract":"Discriminative training of the feature space using the minimum phone error (MPE) objective function has been shown to yield remarkable accuracy improvements. These gains, however, come at a high cost of memory required to store the transform. In a previous paper we reduced this memory requirement by 94% by quantizing the transform parameters. We used dimension dependent quantization tables and learned the quantization values with a fixed assignment of transform parameters to quantization values. In this paper we refine and extend the techniques to attain a further 35% reduction in memory with no degradation in sentence error rate. We discuss a principled method to assign the transform parameters to quantization values. We also show how the memory can be gradually reduced using a Viterbi algorithm to optimally assign variable number of bits to dimension dependent quantization tables. The techniques described could also be applied to the quantization of general linear transforms - a problem that should be of wider interest.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114446903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信