2011 IEEE Workshop on Automatic Speech Recognition & Understanding最新文献

筛选
英文 中文
Matched-condition robust Dynamic Noise Adaptation 匹配条件鲁棒动态噪声自适应
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163919
Steven J. Rennie, Pierre L. Dognin, P. Fousek
{"title":"Matched-condition robust Dynamic Noise Adaptation","authors":"Steven J. Rennie, Pierre L. Dognin, P. Fousek","doi":"10.1109/ASRU.2011.6163919","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163919","url":null,"abstract":"In this paper we describe how the model-based noise robustness algorithm for previously unseen noise conditions, Dynamic Noise Adaptation (DNA), can be made robust to matched data, without the need to do any system re-training. The approach is to do online model selection and averaging between two DNA models of noise: one that is tracking the evolving state of the background noise, and one clamped to the null mis-match hypothesis. The approach, which we call DNA with (matched) condition detection (DNA-CD), improves the performance of a commerical-grade speech recognizer that utilizes feature-space Maximum Mutual Information (fMMI), boosted MMI (bMMI), and feature-space Maximum Likelihood Linear Regression (fMLLR) compensation by 15% relative at signal-to-noise ratios (SNRs) below 10 dB, and over 8% relative overall.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125093735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Efficient spoken term discovery using randomized algorithms 使用随机化算法的高效口语术语发现
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163965
A. Jansen, Benjamin Van Durme
{"title":"Efficient spoken term discovery using randomized algorithms","authors":"A. Jansen, Benjamin Van Durme","doi":"10.1109/ASRU.2011.6163965","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163965","url":null,"abstract":"Spoken term discovery is the task of automatically identifying words and phrases in speech data by searching for long repeated acoustic patterns. Initial solutions relied on exhaustive dynamic time warping-based searches across the entire similarity matrix, a method whose scalability is ultimately limited by the O(n2) nature of the search space. Recent strategies have attempted to improve search efficiency by using either unsupervised or mismatched-language acoustic models to reduce the complexity of the feature representation. Taking a completely different approach, this paper investigates the use of randomized algorithms that operate directly on the raw acoustic features to produce sparse approximate similarity matrices in O(n) space and O(n log n) time. We demonstrate these techniques facilitate spoken term discovery performance capable of outperforming a model-based strategy in the zero resource setting.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126694581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 165
Improved spoken term detection using support vector machines with acoustic and context features from pseudo-relevance feedback 利用基于伪相关反馈的声学和上下文特征的支持向量机改进口语术语检测
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163962
Tsung-wei Tu, Hung-yi Lee, Lin-Shan Lee
{"title":"Improved spoken term detection using support vector machines with acoustic and context features from pseudo-relevance feedback","authors":"Tsung-wei Tu, Hung-yi Lee, Lin-Shan Lee","doi":"10.1109/ASRU.2011.6163962","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163962","url":null,"abstract":"This paper reports a new approach to improving spoken term detection that uses support vector machine (SVM) with acoustic and linguistic features. As SVM is a good technique for discriminating different features in vector space, we recently proposed to use pseudo-relevance feedback to automatically generate training data for SVM training and use SVM to re-rank the first-pass results considering the context consistency in the lattices. In this paper, we further extend this concept by considering acoustic features at word, phone and HMM state levels and linguistic features of different order. Extensive experiments under various recognition environments demonstrate significant improvements in all cases. In particular, the acoustic features at the HMM state level offered the most significant improvements, and the improvements achieved by acoustic and linguistic features are shown to be additive.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123331120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Applying feature bagging for more accurate and robust automated speaking assessment 应用特征套袋更准确和强大的自动说话评估
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163977
L. Chen
{"title":"Applying feature bagging for more accurate and robust automated speaking assessment","authors":"L. Chen","doi":"10.1109/ASRU.2011.6163977","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163977","url":null,"abstract":"The scoring model used in automated speaking assessment systems is critical for achieving accurate and robust scoring of speaking skills automatically. In the automated speaking assessment research field, using a single classifier model is still a dominant approach. However, ensemble learning, which relies on a committee of classifiers to predict jointly (to overcome each individual classifier's weakness) has been actively advocated by the machine learning researchers and widely used in many machine learning tasks. In this paper, we investigated applying a special ensemble learning method, feature-bagging, on the task of automatically scoring non-native spontaneous speech. Our experiments show that this method is superior to the method of using a single classifier in terms of scoring accuracy and the robustness to cope with possible feature variations.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"121 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114306102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A factored conditional random field model for articulatory feature forced transcription 发音特征强制转录的因子条件随机场模型
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163909
Rohit Prabhavalkar, E. Fosler-Lussier, Karen Livescu
{"title":"A factored conditional random field model for articulatory feature forced transcription","authors":"Rohit Prabhavalkar, E. Fosler-Lussier, Karen Livescu","doi":"10.1109/ASRU.2011.6163909","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163909","url":null,"abstract":"We investigate joint models of articulatory features and apply these models to the problem of automatically generating articulatory transcriptions of spoken utterances given their word transcriptions. The task is motivated by the need for larger amounts of labeled articulatory data for both speech recognition and linguistics research, which is costly and difficult to obtain through manual transcription or physical measurement. Unlike phonetic transcription, in our task it is important to account for the fact that the articulatory features can desynchronize. We consider factored models of the articulatory state space with an explicit model of articulator asynchrony. We compare two types of graphical models: a dynamic Bayesian network (DBN), based on previously proposed models; and a conditional random field (CRF), which we develop here. We demonstrate how task-specific constraints can be leveraged to allow for efficient exact inference in the CRF. On the transcription task, the CRF outperforms the DBN, with relative improvements of 2.2% to 10.0%.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124127514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Exploiting distance based similarity in topic models for user intent detection 利用主题模型中基于距离的相似度进行用户意图检测
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163969
Asli Celikyilmaz, Dilek Z. Hakkani-Tür, Gökhan Tür, Ashley Fidler, D. Hillard
{"title":"Exploiting distance based similarity in topic models for user intent detection","authors":"Asli Celikyilmaz, Dilek Z. Hakkani-Tür, Gökhan Tür, Ashley Fidler, D. Hillard","doi":"10.1109/ASRU.2011.6163969","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163969","url":null,"abstract":"One of the main components of spoken language understanding is intent detection, which allows user goals to be identified. A challenging sub-task of intent detection is the identification of intent bearing phrases from a limited amount of training data, while maintaining the ability to generalize well. We present a new probabilistic topic model for jointly identifying semantic intents and common phrases in spoken language utterances. Our model jointly learns a set of intent dependent phrases and captures semantic intent clusters as distributions over these phrases based on a distance dependent sampling method. This sampling method uses proximity of words utterances when assigning words to latent topics. We evaluate our method on labeled utterances and present several examples of discovered semantic units. We demonstrate that our model outperforms standard topic models based on bag-of-words assumption.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126419821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Estimating document frequencies in a speech corpus 估计语料库中的文档频率
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163966
D. Karakos, Mark Dredze, K. Church, A. Jansen, S. Khudanpur
{"title":"Estimating document frequencies in a speech corpus","authors":"D. Karakos, Mark Dredze, K. Church, A. Jansen, S. Khudanpur","doi":"10.1109/ASRU.2011.6163966","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163966","url":null,"abstract":"Inverse Document Frequency (IDF) is an important quantity in many applications, including Information Retrieval. IDF is defined in terms of document frequency, df (w), the number of documents that mention w at least once. This quantity is relatively easy to compute over textual documents, but spoken documents are more challenging. This paper considers two baselines: (1) an estimate based on the 1-best ASR output and (2) an estimate based on expected term frequencies computed from the lattice. We improve over these baselines by taking advantage of repetition. Whatever the document is about is likely to be repeated, unlike ASR errors, which tend to be more random (Poisson). In addition, we find it helpful to consider an ensemble of language models. There is an opportunity for the ensemble to reduce noise, assuming that the errors across language models are relatively uncorrelated. The opportunity for improvement is larger when WER is high. This paper considers a pairing task application that could benefit from improved estimates of df. The pairing task inputs conversational sides from the English Fisher corpus and outputs estimates of which sides were from the same conversation. Better estimates of df lead to better performance on this task.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132137021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Minimum detection error training of subword detectors 子词检测器的最小检测误差训练
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163983
Alfonso M. Canterla, M. H. Johnsen
{"title":"Minimum detection error training of subword detectors","authors":"Alfonso M. Canterla, M. H. Johnsen","doi":"10.1109/ASRU.2011.6163983","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163983","url":null,"abstract":"This paper presents methods and results for optimizing subword detectors in continuous speech. Speech detectors are useful within areas like detection-based ASR, pronunciation training, phonetic analysis, word spotting, etc. We propose a new discriminative training criterion for subword unit detectors that is based on the Minimum Phone Error framework. The criterion can optimize the F-score or any other detection performance metric. The method is applied to the optimization of HMMs and MFCC filterbanks in phone detectors. The resulting filterbanks differ from each other and reflect acoustic properties of the corresponding detection classes. For the experiments in TIMIT, the best optimized detectors had a relative accuracy improvement of 31.3% over baseline and 18.2% over our previous MCE-based method.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134613689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Cross-lingual portability of Chinese and english neural network features for French and German LVCSR 法语和德语LVCSR中英文神经网络特征的跨语言可移植性
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163960
Christian Plahl, R. Schlüter, H. Ney
{"title":"Cross-lingual portability of Chinese and english neural network features for French and German LVCSR","authors":"Christian Plahl, R. Schlüter, H. Ney","doi":"10.1109/ASRU.2011.6163960","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163960","url":null,"abstract":"This paper investigates neural network (NN) based cross-lingual probabilistic features. Earlier work reports that intra-lingual features consistently outperform the corresponding cross-lingual features. We show that this may not generalize. Depending on the complexity of the NN features, cross-lingual features reduce the resources used for training —the NN has to be trained on one language only— without any loss in performance w.r.t. word error rate (WER). To further investigate this inconsistency concerning intra- vs. cross-lingual neural network features, we analyze the performance of these features w.r.t. the degree of kinship between training and testing language, and the amount of training data used. Whenever the same amount of data is used for NN training, a close relationship between training and testing language is required to achieve similar results. By increasing the training data the relationship becomes less, as well as changing the topology of the NN to the bottle neck structure. Moreover, cross-lingual features trained on English or Chinese improve the best intra-lingual system for German up to 2% relative in WER and up to 3% relative for French and achieve the same improvement as for discriminative training. Moreover, we gain again up to 8% relative in WER by combining intra- and cross-lingual systems.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"3 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131272181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Model-based parametric features for emotion recognition from speech 基于模型的语音情感识别参数特征
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163987
Sankaranarayanan Ananthakrishnan, Aravind Namandi Vembu, R. Prasad
{"title":"Model-based parametric features for emotion recognition from speech","authors":"Sankaranarayanan Ananthakrishnan, Aravind Namandi Vembu, R. Prasad","doi":"10.1109/ASRU.2011.6163987","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163987","url":null,"abstract":"Automatic emotion recognition from speech is desirable in many applications relying on spoken language processing. Telephone-based customer service systems, psychological healthcare initiatives, and virtual training modules are examples of real-world applications that would significantly benefit from such capability. Traditional utterance-level emotion recognition relies on a global feature set obtained by computing various statistics from raw segmental and supra-segmental measurements, including fundamental frequency (F0), energy, and MFCCs. In this paper, we propose a novel, model-based parametric feature set that better discriminates between the competing emotion classes. Our approach relaxes modeling assumptions associated with using global statistics (e.g. mean, standard deviation, etc.) of traditional segment-level features for classification, and results in significant improvements over the state-of-the-art in 7-way emotion classification accuracy on the standard, freely-available Berlin Emotional Speech Corpus. These improvements are consistent even in a reduced feature space obtained by Fisher's Multiple Linear Discriminant Analysis, demonstrating the signficantly higher discriminative power of the proposed feature set.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124559733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信