2011 IEEE Workshop on Automatic Speech Recognition & Understanding最新文献_第9页

A dialogue system for accessing drug reviews 访问药物审查的对话系统

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163952

Jingjing Liu, S. Seneff

引用次数: 9

Bidirectional OM-LSA speech estimator for noise robust speech recognition 用于噪声鲁棒语音识别的双向OM-LSA语音估计器

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163926

Y. Obuchi, Ryu Takeda, M. Togami

引用次数: 6

Some properties of Bayesian sensing hidden Markov models 贝叶斯感知隐马尔可夫模型的一些性质

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163907

G. Saon, Jen-Tzung Chien

{"title":"Some properties of Bayesian sensing hidden Markov models","authors":"G. Saon, Jen-Tzung Chien","doi":"10.1109/ASRU.2011.6163907","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163907","url":null,"abstract":"In Bayesian sensing hidden Markov models (BSHMMs) the acoustic feature vectors are represented by a set of state-dependent basis vectors and by time-dependent sensing weights. The Bayesian formulation comes from assuming state-dependent zero mean Gaussian priors for the weights and from using marginal likelihood functions obtained by integrating out the weights. Here, we discuss two properties of BSHMMs. The first property is that the marginal likelihood is Gaussian with a factor analyzed covariance matrix with the basis providing a low-rank correction to the diagonal covariance of the reconstruction errors. The second property, termed automatic relevance determination, provides a method for discarding basis vectors that are not relevant for encoding feature vectors. This allows model complexity control where one can initially train a large model and then prune it to a smaller size by removing the basis vectors which correspond to the largest precision values of the sensing weights. The last property turned out to be useful in successfully deploying models trained on 1800 hours of data during the 2011 DARPA GALE Arabic broadcast news transcription evaluation.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132711632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Robust understanding of spoken Chinese through character-based tagging and prior knowledge exploitation 通过基于字符的标注和先验知识开发，对汉语口语有较强的理解

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163967

Weiqun Xu, C. Bao, Yali Li, Jielin Pan, Yonghong Yan

引用次数: 0

Detection-based accented speech recognition using articulatory features 基于检测的基于发音特征的重音语音识别

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163982

Chao Zhang, Yi Liu, Chin-Hui Lee

{"title":"Detection-based accented speech recognition using articulatory features","authors":"Chao Zhang, Yi Liu, Chin-Hui Lee","doi":"10.1109/ASRU.2011.6163982","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163982","url":null,"abstract":"We propose an attribute-based approach to accented speech recognition based on automatic speech attribute transcription with high efficiency detection of articulatory features. In order to utilize appropriate and extensible phonetic and linguistic knowledge, conditional random field (CRF) is designed to take frame-level inputs with binary feature functions. The use of CRF with merely the state features to generate probabilistic phone lattices is then utilized to solve the phone under-generation problem. Finally an attribute discrimination module is incorporated to handle a diversity of accent changes without retraining any model, leading to flexible “plug ‘n’ play” modular design. The effectiveness of the proposed approach is evaluated on three typical Chinese accents, namely Guanhua, Yue and Wu. Our method yields a significant absolute phone recognition accuracy improvement 5.04%, 4.68% and 6.06% for the corresponding three accent types over a conventional monophone HMM system. Compared to a context-dependent triphone HMM system, we achieve comparable phone accuracies at only less than 20% of the computation cost. In addition, our proposed method is equally applicable to speaker-independent systems handling multiple accents.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130688325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Query modeling for spoken document retrieval 用于口语文档检索的查询建模

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163963

Berlin Chen, Pei-Ning Chen, Kuan-Yu Chen

{"title":"Query modeling for spoken document retrieval","authors":"Berlin Chen, Pei-Ning Chen, Kuan-Yu Chen","doi":"10.1109/ASRU.2011.6163963","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163963","url":null,"abstract":"Spoken document retrieval (SDR) has recently become a more interesting research avenue due to increasing volumes of publicly available multimedia associated with speech information. Many efforts have been devoted to developing elaborate indexing and modeling techniques for representing spoken documents, but only few to improving query formulations for better representing the users' information needs. In view of this, we recently presented a language modeling framework exploring a novel use of relevance information cues for improving query effectiveness. Our work in this paper continues this general line of research in two main aspects. We further explore various ways to glean both relevance and non-relevance cues from the spoken document collection so as to enhance query modeling in an unsupervised fashion. Furthermore, we also investigate representing the query and documents with different granularities of index features to work in conjunction with the various relevance and/or non-relevance cues. Experiments conducted on the TDT (Topic Detection and Tracking) SDR task demonstrate the performance merits of the methods instantiated from our retrieval framework when compared to other existing retrieval methods.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115411064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Strategies for using MLP based features with limited target-language training data 在有限的目标语言训练数据下使用基于MLP的特征的策略

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163957

Y. Qian, Ji Xu, Daniel Povey, Jia Liu

引用次数: 18

Factored adaptation for separable compensation of speaker and environmental variability 说话人与环境可变性可分离补偿的因子自适应

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163921

M. Seltzer, A. Acero

引用次数: 16

Analyzing conversations using rich phrase patterns 使用丰富的短语模式分析对话

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163972

Bin Zhang, Alex Marin, Brian Hutchinson, Mari Ostendorf

引用次数: 2

Linear versus mel frequency cepstral coefficients for speaker recognition 线性与mel频率倒谱系数用于说话人识别

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163888

Xinhui Zhou, D. Garcia-Romero, R. Duraiswami, C. Espy-Wilson, S. Shamma

{"title":"Linear versus mel frequency cepstral coefficients for speaker recognition","authors":"Xinhui Zhou, D. Garcia-Romero, R. Duraiswami, C. Espy-Wilson, S. Shamma","doi":"10.1109/ASRU.2011.6163888","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163888","url":null,"abstract":"Mel-frequency cepstral coefficients (MFCC) have been dominantly used in speaker recognition as well as in speech recognition. However, based on theories in speech production, some speaker characteristics associated with the structure of the vocal tract, particularly the vocal tract length, are reflected more in the high frequency range of speech. This insight suggests that a linear scale in frequency may provide some advantages in speaker recognition over the mel scale. Based on two state-of-the-art speaker recognition back-end systems (one Joint Factor Analysis system and one Probabilistic Linear Discriminant Analysis system), this study compares the performances between MFCC and LFCC (Linear frequency cepstral coefficients) in the NIST SRE (Speaker Recognition Evaluation) 2010 extended-core task. Our results in SRE10 show that, while they are complementary to each other, LFCC consistently outperforms MFCC, mainly due to its better performance in the female trials. This can be explained by the relatively shorter vocal tract in females and the resulting higher formant frequencies in speech. LFCC benefits more in female speech by better capturing the spectral characteristics in the high frequency region. In addition, our results show some advantage of LFCC over MFCC in reverberant speech. LFCC is as robust as MFCC in the babble noise, but not in the white noise. It is concluded that LFCC should be more widely used, at least for the female trials, by the mainstream of the speaker recognition community.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"299 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128617675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 144