2011 IEEE Workshop on Automatic Speech Recognition & Understanding最新文献

筛选
英文 中文
Gain estimation approaches in catalog-based single-channel speech-music separation 基于目录的单通道语音-音乐分离增益估计方法
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163928
Cemil Demir, A. Cemgil, M. Saraçlar
{"title":"Gain estimation approaches in catalog-based single-channel speech-music separation","authors":"Cemil Demir, A. Cemgil, M. Saraçlar","doi":"10.1109/ASRU.2011.6163928","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163928","url":null,"abstract":"In this study, we analyze the gain estimation problem of the catalog-based single-channel speech-music separation method, which we proposed previously. In the proposed method, assuming that we know a catalog of the background music, we developed a generative model for the superposed speech and music spectrograms. We represent the speech spectrogram by a Non-Negative Matrix Factorization (NMF) model and the music spectrogram by a conditional Poisson Mixture Model (PMM). In this model, we assume that the background music is generated by repeating and changing the gain of the jingle in the music catalog. Although the separation performance of the proposed method is satisfactory with known gain values, the performance decreases when the gain value of the jingle is unknown and has to be estimated. In this paper, we address the gain estimation problem of the catalog-based method and propose three different approaches to overcome this problem. One of these approaches is to use Gamma Markov Chain (GMC) probabilistic structure to impose the correlation between the gain parameters across the time frames. By using GMC, the gain parameter is estimated more accurately. The other approaches are maximum a posteriori (MAP) and piece-wise constant estimation (PCE) of the gain values. Although all three methods improve the separation performance as compared to the original method itself, GMC approach achieved the best performance.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122946618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
From Modern Standard Arabic to Levantine ASR: Leveraging GALE for dialects 从现代标准阿拉伯语到黎凡特ASR:利用GALE进行方言
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163942
H. Soltau, L. Mangu, Fadi Biadsy
{"title":"From Modern Standard Arabic to Levantine ASR: Leveraging GALE for dialects","authors":"H. Soltau, L. Mangu, Fadi Biadsy","doi":"10.1109/ASRU.2011.6163942","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163942","url":null,"abstract":"We report a series of experiments about how we can progress from Modern Standard Arabic (MSA) to Levantine ASR, in the context of the GALE DARPA program. While our GALE models achieved very low error rates, we still see error rates twice as high when decoding dialectal data. In this paper, we make use of a state-of-the-art Arabic dialect recognition system to automatically identify Levantine and MSA subsets in mixed speech of a variety of dialects including MSA. Training separate models on these subsets, we show a significant reduction in word error rate over using the entire data set to train one system for both dialects. During decoding, we use a tree array structure to mix Levantine and MSA models automatically using the posterior probabilities of the dialect classifier as soft weights. This technique allows us to mix these models without sacrificing performance for either varieties. Furthermore, using the initial acoustic-based dialect recognition system's output, we show that we can bootstrap a text-based dialect classifier and use it to identify relevant text data for building Levantine language models. Moreover, we compare different vowelization approaches when transitioning from MSA to Levantine models.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126090793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Utterance verification using garbage words for a hospital appointment system with speech interface 基于语音接口的医院预约系统中垃圾词的语音验证
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163954
Mitsuru Takaoka, H. Nishizaki, Y. Sekiguchi
{"title":"Utterance verification using garbage words for a hospital appointment system with speech interface","authors":"Mitsuru Takaoka, H. Nishizaki, Y. Sekiguchi","doi":"10.1109/ASRU.2011.6163954","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163954","url":null,"abstract":"On a system that captures spoken dialog, users often use out-of-domain utterances to the system. The speech recognition component in the dialog system cannot correctly recognize such utterances, which causes fatal errors. This paper proposes a method to verify whether utterances are in-domain or out-of-domain. The proposed method trains systems with two language models: one that can accept both in-domain and out-of-domain utterances and the other that can accept only in-domain utterances. These models are installed into two speech recognition systems. A comparison of the recognizers' outputs provides a good verification of utterances. We installed our method in a hospital appointment system and evaluated it. The experimental results showed that the proposed method worked well.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121796495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving reverberant VTS for hands-free robust speech recognition 改进混响VTS免提鲁棒语音识别
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163915
Yongqiang Wang, M. Gales
{"title":"Improving reverberant VTS for hands-free robust speech recognition","authors":"Yongqiang Wang, M. Gales","doi":"10.1109/ASRU.2011.6163915","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163915","url":null,"abstract":"Model-based approaches to handling additive background noise and channel distortion, such as Vector Taylor Series (VTS), have been intensively studied and extended in a number of ways. In previous work, VTS has been extended to handle both reverberant and background noise, yielding the Reverberant VTS (RVTS) scheme. In this work, rather than assuming the observation vector is generated by the reverberation of a sequence of background noise corrupted speech vectors, as in RVTS, the observation vector is modelled as a superposition of the background noise and the reverberation of clean speech. This yields a new compensation scheme RVTS Joint (RVTSJ), which allows an easy formulation for joint estimation of both additive and reverberation noise parameters. These two compensation schemes were evaluated and compared on a simulated reverberant noise corrupted AURORA4 task. Both yielded large gains over VTS baseline system, with RVTSJ outperforming the previous RVTS scheme.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131188685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Latent semantic analysis for question classification with neural networks 神经网络问题分类的潜在语义分析
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163971
B. Loni, Seyedeh Halleh Khoshnevis, P. Wiggers
{"title":"Latent semantic analysis for question classification with neural networks","authors":"B. Loni, Seyedeh Halleh Khoshnevis, P. Wiggers","doi":"10.1109/ASRU.2011.6163971","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163971","url":null,"abstract":"An important component of question answering systems is question classification. The task of question classification is to predict the entity type of the answer of a natural language question. Question classification is typically done using machine learning techniques. Most approaches use features based on word unigrams which leads to large feature space. In this work we applied Latent Semantic Analysis (LSA) technique to reduce the large feature space of questions to a much smaller and efficient feature space. We used two different classifiers: Back-Propagation Neural Networks (BPNN) and Support Vector Machines (SVM). We found that applying LSA on question classification can not only make the question classification more time efficient, but it also improves the classification accuracy by removing the redundant features. Furthermore, we discovered that when the original feature space is compact and efficient, its reduced space performs better than a large feature space with a rich set of features. In addition, we found that in the reduced feature space, BPNN performs better than SVMs which are widely used in question classification. Our result on the well known UIUC dataset is competitive with the state-of-the-art in this field, even though we used much smaller feature spaces.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129837047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Detection of persons with Parkinson's disease by acoustic, vocal, and prosodic analysis 通过声学、声乐和韵律分析检测帕金森病患者
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163978
T. Bocklet, E. Nöth, G. Stemmer, Hana Ruzickova, J. Rusz
{"title":"Detection of persons with Parkinson's disease by acoustic, vocal, and prosodic analysis","authors":"T. Bocklet, E. Nöth, G. Stemmer, Hana Ruzickova, J. Rusz","doi":"10.1109/ASRU.2011.6163978","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163978","url":null,"abstract":"70% to 90% of patients with Parkinson's disease (PD) show an affected voice. Various studies revealed, that voice and prosody is one of the earliest indicators of PD. The issue of this study is to automatically detect whether the speech/voice of a person is affected by PD. We employ acoustic features, prosodic features and features derived from a two-mass model of the vocal folds on different kinds of speech tests: sustained phonations, syllable repetitions, read texts and monologues. Classification is performed in either case by SVMs. A correlation-based feature selection was performed, in order to identify the most important features for each of these systems. We report recognition results of 91% when trying to differentiate between normal speaking persons and speakers with PD in early stages with prosodic modeling. With acoustic modeling we achieved a recognition rate of 88% and with vocal modeling we achieved 79%. After feature selection these results could greatly be improved. But we expect those results to be too optimistic. We show that read texts and monologues are the most meaningful texts when it comes to the automatic detection of PD based on articulation, voice, and prosodic evaluations. The most important prosodic features were based on energy, pauses and F0. The masses and the compliances of spring were found to be the most important parameters of the two-mass vocal fold model.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129309087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 72
Subspace Gaussian Mixture Models for vectorial HMM-states representation 向量hmm状态表示的子空间高斯混合模型
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163984
M. Bouallegue, D. Matrouf, Mickael Rouvier, G. Linarès
{"title":"Subspace Gaussian Mixture Models for vectorial HMM-states representation","authors":"M. Bouallegue, D. Matrouf, Mickael Rouvier, G. Linarès","doi":"10.1109/ASRU.2011.6163984","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163984","url":null,"abstract":"In this paper we present a vectorial representation of the HMM states that is inspired by the Subspace Gaussian Mixture Models paradigm (SGMM). This vectorial representation of states will make possible a large number of applications, such as HMM-states clustering and graphical visualization. Thanks to this representation, the Hidden Markov Model (HMM) states can be seen as sets of points in multi-dimensional space and then can be studied using statistical data analysis techniques. In this paper, we show how this representation can be obtained and used for tying states of an HHM-based automatic speech recognition system without any use of linguistic or phonetic knowledge. In experiments, this approach achieves significant and stable gain, while conserving the classical approach based on decision trees. We also show how it can be used for graphical visualization, which can be useful in other domains like phonetics or clinical phonetics.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"4498 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127720933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Socio-situational setting classification based on language use 基于语言使用的社会情境设置分类
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163974
Yangyang Shi, P. Wiggers, C. Jonker
{"title":"Socio-situational setting classification based on language use","authors":"Yangyang Shi, P. Wiggers, C. Jonker","doi":"10.1109/ASRU.2011.6163974","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163974","url":null,"abstract":"We present a method for automatic classification of the socio-situational setting of a conversation based on the language used. The socio-situational setting depicts the social background of a conversation which involves the communicative goals, number of speakers, number of listeners and the relationship among the speakers and the listeners. Knowledge of the socio-situational setting can be used to search for content recorded in a particular setting or to select context-dependent models for example for speech recognition. We investigated the performance of different feature sets of conversation level features and word level features and their combinations on this task. Our final system, that classifies the conversations in the Spoken Dutch Corpus in one of 14 socio-situational settings, achieves an accuracy of 89.55%.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130960302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Making Deep Belief Networks effective for large vocabulary continuous speech recognition 深度信念网络对大词汇量连续语音识别的有效性研究
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163900
Tara N. Sainath, Brian Kingsbury, B. Ramabhadran, P. Fousek, Petr Novák, Abdel-rahman Mohamed
{"title":"Making Deep Belief Networks effective for large vocabulary continuous speech recognition","authors":"Tara N. Sainath, Brian Kingsbury, B. Ramabhadran, P. Fousek, Petr Novák, Abdel-rahman Mohamed","doi":"10.1109/ASRU.2011.6163900","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163900","url":null,"abstract":"To date, there has been limited work in applying Deep Belief Networks (DBNs) for acoustic modeling in LVCSR tasks, with past work using standard speech features. However, a typical LVCSR system makes use of both feature and model-space speaker adaptation and discriminative training. This paper explores the performance of DBNs in a state-of-the-art LVCSR system, showing improvements over Multi-Layer Perceptrons (MLPs) and GMM/HMMs across a variety of features on an English Broadcast News task. In addition, we provide a recipe for data parallelization of DBN training, showing that data parallelization can provide linear speed-up in the number of machines, without impacting WER.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116734991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 198
Accent level adjustment in bilingual Thai-English text-to-speech synthesis 泰英双语文本-语音合成中的口音水平调整
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163947
C. Wutiwiwatchai, A. Thangthai, A. Chotimongkol, C. Hansakunbuntheung, N. Thatphithakkul
{"title":"Accent level adjustment in bilingual Thai-English text-to-speech synthesis","authors":"C. Wutiwiwatchai, A. Thangthai, A. Chotimongkol, C. Hansakunbuntheung, N. Thatphithakkul","doi":"10.1109/ASRU.2011.6163947","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163947","url":null,"abstract":"This paper introduces an accent level adjustment mechanism for Thai-English text-to-speech synthesis (TTS). English words often appearing in modern Thai writing can be speech synthesized by either Thai TTS using corresponding Thai phones or by separated English TTS using English phones. As many Thai native listeners may not prefer any of such extreme accent styles, a mechanism that allows selecting accent level preference is proposed. In HMM-based TTS, adjusting the accent level is done by interpolating HMMs of purely Thai and purely English sounds. Solutions for cross-language phone alignment and HMM state mapping are addressed. Evaluations are performed by a listening test on sounds synthesized with varied accent levels. Experimental results show that the proposed method is acceptable by the majority of human listeners.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114714211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信