2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)最新文献

筛选
英文 中文
Two extensions to ensemble speaker and speaking environment modeling for robust automatic speech recognition 两个扩展集成扬声器和说话环境建模鲁棒自动语音识别
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430087
Yu Tsao, Chin-Hui Lee
{"title":"Two extensions to ensemble speaker and speaking environment modeling for robust automatic speech recognition","authors":"Yu Tsao, Chin-Hui Lee","doi":"10.1109/ASRU.2007.4430087","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430087","url":null,"abstract":"Recently an ensemble speaker and speaking environment modeling (ESSEM) approach to characterizing unknown testing environments was studied for robust speech recognition. Each environment is modeled by a super-vector consisting of the entire set of mean vectors from all Gaussian densities of a set of HMMs for a particular environment. The super-vector for a new testing environment is then obtained by an affine transformation on the ensemble super-vectors. In this paper, we propose a minimum classification error training procedure to obtain discriminative ensemble elements, and a super-vector clustering technique to achieve refined ensemble structures. We test these two extentions to ESSEM on Aurora2. In a per-utterance unsupervised adaptation mode we achieved an average WER of 4.99% from OdB to 20 dB conditions with these two extentions when compared with a 5.51% WER obtained with the ML-trained gender-dependent baseline. To our knowledge this represents the best result reported in the literature on the Aurora2 connected digit recognition task.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117125164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A fast-match approach for robust, faster than real-time speaker diarization 一种鲁棒的快速匹配方法,比实时扬声器拨号更快
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430196
Yan Huang, Oriol Vinyals, G. Friedland, Christian A. Müller, Nikki Mirghafori, Chuck Wooters
{"title":"A fast-match approach for robust, faster than real-time speaker diarization","authors":"Yan Huang, Oriol Vinyals, G. Friedland, Christian A. Müller, Nikki Mirghafori, Chuck Wooters","doi":"10.1109/ASRU.2007.4430196","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430196","url":null,"abstract":"During the past few years, speaker diarization has achieved satisfying accuracy in terms of speaker Diarization Error Rate (DER). The most successful approaches, based on agglomerative clustering, however, exhibit an inherent computational complexity which makes real-time processing, especially in combination with further processing steps, almost impossible. In this article we present a framework to speed up agglomerative clustering speaker diarization. The basic idea is to adopt a computationally cheap method to reduce the hypothesis space of the more expensive and accurate model selection via Bayesian Information Criterion (BIC). Two strategies based on the pitch-correlogram and the unscented-trans-form based approximation of KL-divergence are used independently as a fast-match approach to select the most likely clusters to merge. We performed the experiments using the existing ICSI speaker diarization system. The new system using KL-divergence fast-match strategy only performs 14% of total BIC comparisons needed in the baseline system, speeds up the system by 41% without affecting the speaker Diarization Error Rate (DER). The result is a robust and faster than real-time speaker diarization system.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"26 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132352790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Automatic lexical pronunciations generation and update 自动词汇发音生成和更新
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430113
Ghinwa F. Choueiter, S. Seneff, James R. Glass
{"title":"Automatic lexical pronunciations generation and update","authors":"Ghinwa F. Choueiter, S. Seneff, James R. Glass","doi":"10.1109/ASRU.2007.4430113","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430113","url":null,"abstract":"Most automatic speech recognizers use a dictionary that maps words to one or more canonical pronunciations. Such entries are typically hand-written by lexical experts. In this research, we investigate a new approach for automatically generating lexical pronunciations using a linguistically motivated subword model, and refining the pronunciations with spoken examples. The approach is evaluated on an isolated word recognition task with a 2 k lexicon of restaurant and street names. A letter-to-sound model is first used to generate seed baseforms for the lexicon. Then spoken utterances of words in the lexicon are presented to a subword recognizer and the top hypotheses are used to update the lexical base-forms. The spelling of each word is also used to constrain the subword search space and generate spelling-constrained baseforms. The results obtained are quite encouraging and indicate that our approach can be successfully used to learn valid pronunciations of new words.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129859588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Semantic translation error rate for evaluating translation systems 评价翻译系统的语义翻译错误率
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430144
Krishna Subramanian, D. Stallard, R. Prasad, S. Saleem, P. Natarajan
{"title":"Semantic translation error rate for evaluating translation systems","authors":"Krishna Subramanian, D. Stallard, R. Prasad, S. Saleem, P. Natarajan","doi":"10.1109/ASRU.2007.4430144","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430144","url":null,"abstract":"In this paper, we introduce a new metric which we call the semantic translation error rate, or STER, for evaluating the performance of machine translation systems. STER is based on the previously published translation error rate (TER) (Snover et al., 2006) and METEOR (Banerjee and Lavie, 2005) metrics. Specifically, STER extends TER in two ways: first, by incorporating word equivalence measures (WordNet and Porter stemming) standardly used by METEOR, and second, by disallowing alignments of concept words to non-concept words (aka stop words). We show how these features make STER alignments better suited for human-driven analysis than standard TER. We also present experimental results that show that STER is better correlated to human judgments than TER. Finally, we compare STER to METEOR, and illustrate that METEOR scores computed using the STER alignments have similar statistical properties to METEOR scores computed using METEOR alignments.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130480239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Towards robust automatic evaluation of pathologic telephone speech 病态电话语音的鲁棒自动评价
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430200
K. Riedhammer, G. Stemmer, T. Haderlein, M. Schuster, F. Rosanowski, E. Nöth, A. Maier
{"title":"Towards robust automatic evaluation of pathologic telephone speech","authors":"K. Riedhammer, G. Stemmer, T. Haderlein, M. Schuster, F. Rosanowski, E. Nöth, A. Maier","doi":"10.1109/ASRU.2007.4430200","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430200","url":null,"abstract":"For many aspects of speech therapy an objective evaluation of the intelligibility of a patient's speech is needed. We investigate the evaluation of the intelligibility of speech by means of automatic speech recognition. Previous studies have shown that measures like word accuracy are consistent with human experts' ratings. To ease the patient's burden, it is highly desirable to conduct the assessment via phone. However, the telephone channel influences the quality of the speech signal which negatively affects the results. To reduce inaccuracies, we propose a combination of two speech recognizers. Experiments on two sets of pathological speech show that the combination results in consistent improvements in the correlation between the automatic evaluation and the ratings by human experts. Furthermore, the approach leads to reductions of 10% and 25% of the maximum error of the intelligibility measure.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"269 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133156696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Phonological feature based variable frame rate scheme for improved speech recognition 基于语音特征的变帧率语音识别改进方案
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430177
A. Sangwan, J. Hansen
{"title":"Phonological feature based variable frame rate scheme for improved speech recognition","authors":"A. Sangwan, J. Hansen","doi":"10.1109/ASRU.2007.4430177","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430177","url":null,"abstract":"In this paper, we propose a new scheme for variable frame rate (VFR) feature processing based on high level segmentation (HLS) of speech into broad phone classes. Traditional fixed-rate processing is not capable of accurately reflecting the dynamics of continuous speech. On the other hand, the proposed VFR scheme adapts the temporal representation of the speech signal by tying the framing strategy with the detected phone class sequence. The phone classes are detected and segmented by using appropriately trained phonological features (PFs). In this manner, the proposed scheme is capable of tracking the evolution of speech due to the underlying phonetic content, and exploiting the non-uniform information flow-rate of speech by using a variable framing strategy. The new VFR scheme is applied to automatic speech recognition of TIMIT and NTIMIT corpora, where it is compared to a traditional fixed window-size/frame-rate scheme. Our experiments yield encouraging results with relative reductions of 24% and 8% in WER (word error rate) for TIMIT and NTIMIT tasks, respectively.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126632039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A language modeling approach to question answering on speech transcripts 语音答疑的语言建模方法
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430112
Matthias H. Heie, E. Whittaker, Josef R. Novak, S. Furui
{"title":"A language modeling approach to question answering on speech transcripts","authors":"Matthias H. Heie, E. Whittaker, Josef R. Novak, S. Furui","doi":"10.1109/ASRU.2007.4430112","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430112","url":null,"abstract":"This paper presents a language modeling approach to sentence retrieval for Question Answering (QA) that we used in Question Answering on speech transcripts (QAst), a pilot task at the Cross Language Evaluation Forum (CLEF) evaluations 2007. A language model (LM) is generated for each sentence and these models are combined with document LMs to take advantage of contextual information. A query expansion technique using class models is proposed and included in our framework. Finally, our method's impact on exact answer extraction is evaluated. We show that combining sentence LMs with document LMs significantly improves sentence retrieval performance, and that this sentence retrieval approach leads to better answer extraction performance.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"292 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121491428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Call classification for automated troubleshooting on large corpora 呼叫分类用于大型语料库的自动故障排除
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430110
Keelan Evanini, David Suendermann-Oeft, R. Pieraccini
{"title":"Call classification for automated troubleshooting on large corpora","authors":"Keelan Evanini, David Suendermann-Oeft, R. Pieraccini","doi":"10.1109/ASRU.2007.4430110","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430110","url":null,"abstract":"This paper compares six algorithms for call classification in the framework of a dialog system for automated troubleshooting. The comparison is carried out on large datasets, each consisting of over 100,000 utterances from two domains: television (TV) and Internet (INT). In spite of the high number of classes (79 for TV and 58 for INT), the best classifier (maximum entropy on word bigrams) achieved more than 77% classification accuracy on the TV dataset and 81% on the INT dataset.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121623531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Combining statistical models with symbolic grammar in parsing 将统计模型与符号语法相结合进行语法分析
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430140
Junichi Tsujii
{"title":"Combining statistical models with symbolic grammar in parsing","authors":"Junichi Tsujii","doi":"10.1109/ASRU.2007.4430140","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430140","url":null,"abstract":"There are two streams of research in computational linguistics and natural language processing, the empiricist and rationalist traditions. Theories and computational techniques in these two streams have been developed separately and different in nature. Although the two traditions have been considered irreconcilable and have often been antagonistic toward each other, I have contention with this assertion, and thus claim that these two research streams in linguistics, despite or due to their differences, can be complementary to each other and should be combined into a unified methodology. I will demonstrate in my talk that there have been interesting developments in this direction of integration, and would like to discuss some of the recent results with their implications on engineering application.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116797752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variational Kullback-Leibler divergence for Hidden Markov models 隐马尔可夫模型的变分Kullback-Leibler散度
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430132
J. Hershey, P. Olsen, Steven J. Rennie
{"title":"Variational Kullback-Leibler divergence for Hidden Markov models","authors":"J. Hershey, P. Olsen, Steven J. Rennie","doi":"10.1109/ASRU.2007.4430132","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430132","url":null,"abstract":"Divergence measures are widely used tools in statistics and pattern recognition. The Kullback-Leibler (KL) divergence between two hidden Markov models (HMMs) would be particularly useful in the fields of speech and image recognition. Whereas the KL divergence is tractable for many distributions, including Gaussians, it is not in general tractable for mixture models or HMMs. Recently, variational approximations have been introduced to efficiently compute the KL divergence and Bhattacharyya divergence between two mixture models, by reducing them to the divergences between the mixture components. Here we generalize these techniques to approach the divergence between HMMs using a recursive backward algorithm. Two such methods are introduced, one of which yields an upper bound on the KL divergence, the other of which yields a recursive closed-form solution. The KL and Bhattacharyya divergences, as well as a weighted edit-distance technique, are evaluated for the task of predicting the confusability of pairs of words.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124385471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信