2011 IEEE Workshop on Automatic Speech Recognition & Understanding最新文献

筛选
英文 中文
A dialogue system for accessing drug reviews 访问药物审查的对话系统
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163952
Jingjing Liu, S. Seneff
{"title":"A dialogue system for accessing drug reviews","authors":"Jingjing Liu, S. Seneff","doi":"10.1109/ASRU.2011.6163952","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163952","url":null,"abstract":"In this paper, we present a framework which harvests grassroots-generated data from the Web (e.g., reviews, blogs), extracts latent information from these data, and provides a multimodal interface for review browsing and inquiring. A prescription-drug domain system is implemented under this framework. Patient-provided drug reviews were collected from various health-related forums, from which significant side effects correlated to each drug type were identified with association algorithms. A multimodal web-based spoken dialogue system was implemented to allow users to inquire about drugs and correlated side effects as well as browsing the reviews obtained from the Web. We report evaluation results on speech recognition, parse coverage and system response.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126052669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Bidirectional OM-LSA speech estimator for noise robust speech recognition 用于噪声鲁棒语音识别的双向OM-LSA语音估计器
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163926
Y. Obuchi, Ryu Takeda, M. Togami
{"title":"Bidirectional OM-LSA speech estimator for noise robust speech recognition","authors":"Y. Obuchi, Ryu Takeda, M. Togami","doi":"10.1109/ASRU.2011.6163926","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163926","url":null,"abstract":"A new speech enhancement method using bidirectional speech estimator is introduced. A widely-known speech enhancement method using the optimally-modified log spectral amplitude (OM-LSA) speech estimator is re-modified under the assumption that the frame-synchronous estimation is not essential in some of the speech recognition applications. The new method utilizes two separate flows of the speech gain estimation, one is along the forward direction of time and the other along the backward direction. A simple look-ahead estimation mechanism is also implemented in each flow. By taking the average of these two gains, the speech estimation becomes more robust under various noise conditions. Evaluation experiments using the artificial and real noisy speech data confirm that the speech recognition accuracy can be greatly improved by the proposed method.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126402288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Some properties of Bayesian sensing hidden Markov models 贝叶斯感知隐马尔可夫模型的一些性质
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163907
G. Saon, Jen-Tzung Chien
{"title":"Some properties of Bayesian sensing hidden Markov models","authors":"G. Saon, Jen-Tzung Chien","doi":"10.1109/ASRU.2011.6163907","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163907","url":null,"abstract":"In Bayesian sensing hidden Markov models (BSHMMs) the acoustic feature vectors are represented by a set of state-dependent basis vectors and by time-dependent sensing weights. The Bayesian formulation comes from assuming state-dependent zero mean Gaussian priors for the weights and from using marginal likelihood functions obtained by integrating out the weights. Here, we discuss two properties of BSHMMs. The first property is that the marginal likelihood is Gaussian with a factor analyzed covariance matrix with the basis providing a low-rank correction to the diagonal covariance of the reconstruction errors. The second property, termed automatic relevance determination, provides a method for discarding basis vectors that are not relevant for encoding feature vectors. This allows model complexity control where one can initially train a large model and then prune it to a smaller size by removing the basis vectors which correspond to the largest precision values of the sensing weights. The last property turned out to be useful in successfully deploying models trained on 1800 hours of data during the 2011 DARPA GALE Arabic broadcast news transcription evaluation.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132711632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Robust understanding of spoken Chinese through character-based tagging and prior knowledge exploitation 通过基于字符的标注和先验知识开发,对汉语口语有较强的理解
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163967
Weiqun Xu, C. Bao, Yali Li, Jielin Pan, Yonghong Yan
{"title":"Robust understanding of spoken Chinese through character-based tagging and prior knowledge exploitation","authors":"Weiqun Xu, C. Bao, Yali Li, Jielin Pan, Yonghong Yan","doi":"10.1109/ASRU.2011.6163967","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163967","url":null,"abstract":"Robustness is one of the most challenging issues for spoken language understanding (SLU). In this paper we studied the semantic understanding of Chinese spoken language for a voice search dialogue system. We first simplified the problem of semantic understanding into a named entity recognition (NER) task, which was further formulated as sequential tagging. We carried out experiments to opt for character over word as the tagging unit. Then two approaches were proposed to exploit prior knowledge - in the form of a domain lexicon - into the character-based tagging framework. One enriched tagger features by incorporating more formal lexical features with a domain lexicon. The other made plain use of domain entities by simply adding them to the training data. Experiment results show that both approaches are effective. The best performance is achieved by combining the above two complimentary approaches. By exploiting prior knowledge we improved the NER performance from 75.27 to 90.24 in F1 score on a field test set using speech recognizer output.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133362923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection-based accented speech recognition using articulatory features 基于检测的基于发音特征的重音语音识别
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163982
Chao Zhang, Yi Liu, Chin-Hui Lee
{"title":"Detection-based accented speech recognition using articulatory features","authors":"Chao Zhang, Yi Liu, Chin-Hui Lee","doi":"10.1109/ASRU.2011.6163982","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163982","url":null,"abstract":"We propose an attribute-based approach to accented speech recognition based on automatic speech attribute transcription with high efficiency detection of articulatory features. In order to utilize appropriate and extensible phonetic and linguistic knowledge, conditional random field (CRF) is designed to take frame-level inputs with binary feature functions. The use of CRF with merely the state features to generate probabilistic phone lattices is then utilized to solve the phone under-generation problem. Finally an attribute discrimination module is incorporated to handle a diversity of accent changes without retraining any model, leading to flexible “plug ‘n’ play” modular design. The effectiveness of the proposed approach is evaluated on three typical Chinese accents, namely Guanhua, Yue and Wu. Our method yields a significant absolute phone recognition accuracy improvement 5.04%, 4.68% and 6.06% for the corresponding three accent types over a conventional monophone HMM system. Compared to a context-dependent triphone HMM system, we achieve comparable phone accuracies at only less than 20% of the computation cost. In addition, our proposed method is equally applicable to speaker-independent systems handling multiple accents.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130688325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Query modeling for spoken document retrieval 用于口语文档检索的查询建模
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163963
Berlin Chen, Pei-Ning Chen, Kuan-Yu Chen
{"title":"Query modeling for spoken document retrieval","authors":"Berlin Chen, Pei-Ning Chen, Kuan-Yu Chen","doi":"10.1109/ASRU.2011.6163963","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163963","url":null,"abstract":"Spoken document retrieval (SDR) has recently become a more interesting research avenue due to increasing volumes of publicly available multimedia associated with speech information. Many efforts have been devoted to developing elaborate indexing and modeling techniques for representing spoken documents, but only few to improving query formulations for better representing the users' information needs. In view of this, we recently presented a language modeling framework exploring a novel use of relevance information cues for improving query effectiveness. Our work in this paper continues this general line of research in two main aspects. We further explore various ways to glean both relevance and non-relevance cues from the spoken document collection so as to enhance query modeling in an unsupervised fashion. Furthermore, we also investigate representing the query and documents with different granularities of index features to work in conjunction with the various relevance and/or non-relevance cues. Experiments conducted on the TDT (Topic Detection and Tracking) SDR task demonstrate the performance merits of the methods instantiated from our retrieval framework when compared to other existing retrieval methods.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115411064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Strategies for using MLP based features with limited target-language training data 在有限的目标语言训练数据下使用基于MLP的特征的策略
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163957
Y. Qian, Ji Xu, Daniel Povey, Jia Liu
{"title":"Strategies for using MLP based features with limited target-language training data","authors":"Y. Qian, Ji Xu, Daniel Povey, Jia Liu","doi":"10.1109/ASRU.2011.6163957","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163957","url":null,"abstract":"Recently there has been some interest in the question of how to build LVCSR systems when there is only a limited amount of acoustic training data in the target language, but possibly more plentiful data in other languages. In this paper we investigate approaches using MLP based features. We experiment with two approaches: One is based on Automatic Speech Attribute Transcription (ASAT), in which we train classifiers to learn articulatory features. The other approach uses only the target-language data and relies on combination of multiple MLPs trained on different subsets. After system combination we get large improvements of more than 10% relative versus a conventional baseline. These feature-level approaches may also be combined with other, model-level methods for the multilingual or low-resource scenario.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121153081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Factored adaptation for separable compensation of speaker and environmental variability 说话人与环境可变性可分离补偿的因子自适应
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163921
M. Seltzer, A. Acero
{"title":"Factored adaptation for separable compensation of speaker and environmental variability","authors":"M. Seltzer, A. Acero","doi":"10.1109/ASRU.2011.6163921","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163921","url":null,"abstract":"While many algorithms for speaker or environment adaptation have been proposed, far less attention has been paid to approaches which address both factors. We recently proposed a method called factored adaptation that can jointly compensate for speaker and environmental mismatch using a cascade of CMLLR transforms that separately compensate for the environment and speaker variability. Performing adaptation in this manner enables a speaker transform estimated in one environment to be be applied when the same user is in different environments. While this algorithm performed well, it relied on knowledge of the operating environment in both training and test. In this paper, we show how unsupervised environment clustering can be used to eliminate this requirement. The improved factored adaptation algorithm achieves relative improvements of 10–18% over conventional CMLLR when applying speaker transforms across environments without needing any additional a priori knowledge.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124127890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Analyzing conversations using rich phrase patterns 使用丰富的短语模式分析对话
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163972
Bin Zhang, Alex Marin, Brian Hutchinson, Mari Ostendorf
{"title":"Analyzing conversations using rich phrase patterns","authors":"Bin Zhang, Alex Marin, Brian Hutchinson, Mari Ostendorf","doi":"10.1109/ASRU.2011.6163972","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163972","url":null,"abstract":"Individual words are not powerful enough for many complex language classification problems. N-gram features include word context information, but are limited to contiguous word sequences. In this paper, we propose to use phrase patterns to extend n-grams for analyzing conversations, using a discriminative approach to learning patterns with a combination of words and word classes to address data sparsity issues. Improvements in performance are reported for two conversation analysis tasks: speaker role recognition and alignment classification.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132332121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Linear versus mel frequency cepstral coefficients for speaker recognition 线性与mel频率倒谱系数用于说话人识别
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163888
Xinhui Zhou, D. Garcia-Romero, R. Duraiswami, C. Espy-Wilson, S. Shamma
{"title":"Linear versus mel frequency cepstral coefficients for speaker recognition","authors":"Xinhui Zhou, D. Garcia-Romero, R. Duraiswami, C. Espy-Wilson, S. Shamma","doi":"10.1109/ASRU.2011.6163888","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163888","url":null,"abstract":"Mel-frequency cepstral coefficients (MFCC) have been dominantly used in speaker recognition as well as in speech recognition. However, based on theories in speech production, some speaker characteristics associated with the structure of the vocal tract, particularly the vocal tract length, are reflected more in the high frequency range of speech. This insight suggests that a linear scale in frequency may provide some advantages in speaker recognition over the mel scale. Based on two state-of-the-art speaker recognition back-end systems (one Joint Factor Analysis system and one Probabilistic Linear Discriminant Analysis system), this study compares the performances between MFCC and LFCC (Linear frequency cepstral coefficients) in the NIST SRE (Speaker Recognition Evaluation) 2010 extended-core task. Our results in SRE10 show that, while they are complementary to each other, LFCC consistently outperforms MFCC, mainly due to its better performance in the female trials. This can be explained by the relatively shorter vocal tract in females and the resulting higher formant frequencies in speech. LFCC benefits more in female speech by better capturing the spectral characteristics in the high frequency region. In addition, our results show some advantage of LFCC over MFCC in reverberant speech. LFCC is as robust as MFCC in the babble noise, but not in the white noise. It is concluded that LFCC should be more widely used, at least for the female trials, by the mainstream of the speaker recognition community.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"299 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128617675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 144
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信