2011 IEEE Workshop on Automatic Speech Recognition & Understanding最新文献_第7页

Gain estimation approaches in catalog-based single-channel speech-music separation 基于目录的单通道语音-音乐分离增益估计方法

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163928

Cemil Demir, A. Cemgil, M. Saraçlar

{"title":"Gain estimation approaches in catalog-based single-channel speech-music separation","authors":"Cemil Demir, A. Cemgil, M. Saraçlar","doi":"10.1109/ASRU.2011.6163928","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163928","url":null,"abstract":"In this study, we analyze the gain estimation problem of the catalog-based single-channel speech-music separation method, which we proposed previously. In the proposed method, assuming that we know a catalog of the background music, we developed a generative model for the superposed speech and music spectrograms. We represent the speech spectrogram by a Non-Negative Matrix Factorization (NMF) model and the music spectrogram by a conditional Poisson Mixture Model (PMM). In this model, we assume that the background music is generated by repeating and changing the gain of the jingle in the music catalog. Although the separation performance of the proposed method is satisfactory with known gain values, the performance decreases when the gain value of the jingle is unknown and has to be estimated. In this paper, we address the gain estimation problem of the catalog-based method and propose three different approaches to overcome this problem. One of these approaches is to use Gamma Markov Chain (GMC) probabilistic structure to impose the correlation between the gain parameters across the time frames. By using GMC, the gain parameter is estimated more accurately. The other approaches are maximum a posteriori (MAP) and piece-wise constant estimation (PCE) of the gain values. Although all three methods improve the separation performance as compared to the original method itself, GMC approach achieved the best performance.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122946618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

From Modern Standard Arabic to Levantine ASR: Leveraging GALE for dialects 从现代标准阿拉伯语到黎凡特ASR:利用GALE进行方言

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163942

H. Soltau, L. Mangu, Fadi Biadsy

{"title":"From Modern Standard Arabic to Levantine ASR: Leveraging GALE for dialects","authors":"H. Soltau, L. Mangu, Fadi Biadsy","doi":"10.1109/ASRU.2011.6163942","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163942","url":null,"abstract":"We report a series of experiments about how we can progress from Modern Standard Arabic (MSA) to Levantine ASR, in the context of the GALE DARPA program. While our GALE models achieved very low error rates, we still see error rates twice as high when decoding dialectal data. In this paper, we make use of a state-of-the-art Arabic dialect recognition system to automatically identify Levantine and MSA subsets in mixed speech of a variety of dialects including MSA. Training separate models on these subsets, we show a significant reduction in word error rate over using the entire data set to train one system for both dialects. During decoding, we use a tree array structure to mix Levantine and MSA models automatically using the posterior probabilities of the dialect classifier as soft weights. This technique allows us to mix these models without sacrificing performance for either varieties. Furthermore, using the initial acoustic-based dialect recognition system's output, we show that we can bootstrap a text-based dialect classifier and use it to identify relevant text data for building Levantine language models. Moreover, we compare different vowelization approaches when transitioning from MSA to Levantine models.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126090793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Utterance verification using garbage words for a hospital appointment system with speech interface 基于语音接口的医院预约系统中垃圾词的语音验证

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163954

Mitsuru Takaoka, H. Nishizaki, Y. Sekiguchi

引用次数: 1

Improving reverberant VTS for hands-free robust speech recognition 改进混响VTS免提鲁棒语音识别

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163915

Yongqiang Wang, M. Gales

引用次数: 14

Latent semantic analysis for question classification with neural networks 神经网络问题分类的潜在语义分析

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163971

B. Loni, Seyedeh Halleh Khoshnevis, P. Wiggers

{"title":"Latent semantic analysis for question classification with neural networks","authors":"B. Loni, Seyedeh Halleh Khoshnevis, P. Wiggers","doi":"10.1109/ASRU.2011.6163971","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163971","url":null,"abstract":"An important component of question answering systems is question classification. The task of question classification is to predict the entity type of the answer of a natural language question. Question classification is typically done using machine learning techniques. Most approaches use features based on word unigrams which leads to large feature space. In this work we applied Latent Semantic Analysis (LSA) technique to reduce the large feature space of questions to a much smaller and efficient feature space. We used two different classifiers: Back-Propagation Neural Networks (BPNN) and Support Vector Machines (SVM). We found that applying LSA on question classification can not only make the question classification more time efficient, but it also improves the classification accuracy by removing the redundant features. Furthermore, we discovered that when the original feature space is compact and efficient, its reduced space performs better than a large feature space with a rich set of features. In addition, we found that in the reduced feature space, BPNN performs better than SVMs which are widely used in question classification. Our result on the well known UIUC dataset is competitive with the state-of-the-art in this field, even though we used much smaller feature spaces.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129837047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Detection of persons with Parkinson's disease by acoustic, vocal, and prosodic analysis 通过声学、声乐和韵律分析检测帕金森病患者

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163978

T. Bocklet, E. Nöth, G. Stemmer, Hana Ruzickova, J. Rusz

{"title":"Detection of persons with Parkinson's disease by acoustic, vocal, and prosodic analysis","authors":"T. Bocklet, E. Nöth, G. Stemmer, Hana Ruzickova, J. Rusz","doi":"10.1109/ASRU.2011.6163978","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163978","url":null,"abstract":"70% to 90% of patients with Parkinson's disease (PD) show an affected voice. Various studies revealed, that voice and prosody is one of the earliest indicators of PD. The issue of this study is to automatically detect whether the speech/voice of a person is affected by PD. We employ acoustic features, prosodic features and features derived from a two-mass model of the vocal folds on different kinds of speech tests: sustained phonations, syllable repetitions, read texts and monologues. Classification is performed in either case by SVMs. A correlation-based feature selection was performed, in order to identify the most important features for each of these systems. We report recognition results of 91% when trying to differentiate between normal speaking persons and speakers with PD in early stages with prosodic modeling. With acoustic modeling we achieved a recognition rate of 88% and with vocal modeling we achieved 79%. After feature selection these results could greatly be improved. But we expect those results to be too optimistic. We show that read texts and monologues are the most meaningful texts when it comes to the automatic detection of PD based on articulation, voice, and prosodic evaluations. The most important prosodic features were based on energy, pauses and F0. The masses and the compliances of spring were found to be the most important parameters of the two-mass vocal fold model.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129309087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 72

Subspace Gaussian Mixture Models for vectorial HMM-states representation 向量hmm状态表示的子空间高斯混合模型

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163984

M. Bouallegue, D. Matrouf, Mickael Rouvier, G. Linarès

引用次数: 2

Socio-situational setting classification based on language use 基于语言使用的社会情境设置分类

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163974

Yangyang Shi, P. Wiggers, C. Jonker

引用次数: 6

Making Deep Belief Networks effective for large vocabulary continuous speech recognition 深度信念网络对大词汇量连续语音识别的有效性研究

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163900

Tara N. Sainath, Brian Kingsbury, B. Ramabhadran, P. Fousek, Petr Novák, Abdel-rahman Mohamed

引用次数: 198

Accent level adjustment in bilingual Thai-English text-to-speech synthesis 泰英双语文本-语音合成中的口音水平调整

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163947

C. Wutiwiwatchai, A. Thangthai, A. Chotimongkol, C. Hansakunbuntheung, N. Thatphithakkul

引用次数: 9