2009 IEEE Workshop on Automatic Speech Recognition & Understanding最新文献

筛选
英文 中文
New perspectives on spoken language understanding: Does machine need to fully understand speech? 口语理解的新视角:机器需要完全理解语音吗?
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373502
Tatsuya Kawahara
{"title":"New perspectives on spoken language understanding: Does machine need to fully understand speech?","authors":"Tatsuya Kawahara","doi":"10.1109/ASRU.2009.5373502","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373502","url":null,"abstract":"Spoken Language Understanding (SLU) has been traditionally formulated to extract meanings or concepts of user utterances in the context of human-machine dialogue. With the broadened coverage of spoken language processing, the tasks and methodologies of SLU have been changed accordingly. The back-end of spoken dialogue systems now consist of not only relational databases (RDB) but also general documents, incorporating information retrieval (IR) and question-answering (QA) techniques. This paradigm shift and the author's approaches are reviewed. SLU is also being designed to cover human-human dialogues and multi-party conversations. Major approaches to “understand” human-human speech communication and a new approach based on the lister's reactions are reviewed. As a whole, these trends are apparently not oriented for full understanding of spoken language, but for robust extraction of clue information.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125304787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A hierarchical structure for modeling inter and intra phonetic information for phoneme recognition 一种用于音位识别的语音间和语音内信息建模的层次结构
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373272
D. Vásquez, Guillermo Aradilla, R. Gruhn, W. Minker
{"title":"A hierarchical structure for modeling inter and intra phonetic information for phoneme recognition","authors":"D. Vásquez, Guillermo Aradilla, R. Gruhn, W. Minker","doi":"10.1109/ASRU.2009.5373272","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373272","url":null,"abstract":"In this paper, we present a two-layer hierarchical structure based on neural networks for phoneme recognition. The proposed structure attempts to model only the characteristics within a phoneme, i.e., intra-phonetic information. This differs from other state-of-the-art hierarchical structures where the first layer typically models the intra-phonetic information while the second layer focuses on modeling the contextual (inter-phonetic) information. An advantage of the proposed model is that it can be added to another layer that focuses on the inter-phonetic information. In this paper, we also show that the categorization between intra- and inter-phonetic information also allows to extend other state-of-the-art hierarchical approaches. A phoneme accuracy of 77.89% is achieved on the TIMIT database, which compares favorably to the best results obtained on this database.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131313941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Topic-based speaker recognition for German parliamentary speeches 基于主题的德国议会发言识别
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5372907
Doris Baum
{"title":"Topic-based speaker recognition for German parliamentary speeches","authors":"Doris Baum","doi":"10.1109/ASRU.2009.5372907","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5372907","url":null,"abstract":"In the last decade, high-level features for speaker recognition have become a research focus, as they are believed to alleviate the weak point of the classical spectral/cepstral-feature-based approaches: mismatch in acoustic conditions or channel between training and test data. Identification cues such as prosody, pronunciation, and idiolect have been successfully investigated. Semantic speaker recognition, such as identifying people by the topics they frequently talk about, has not found an equal amount of attention. However, it is a promising approach, especially for broadcast data and multimedia archives, where prominent speakers can be expected to often talk about their specific subjects. This paper reports on our experiments with topic-based speaker recognition on German parliamentary speeches. Text transcripts of speeches of federal ministers were used to train speaker models based on word frequencies. For recognition, these models were applied to automatic speech recognition transcripts of parliamentary speeches and could identify the correct speaker surprisingly well, with an EER of 13.8%. Fusing this approach with a classical GMM-UBM system (with EER 14.3%) yields an improved EER of 8.6%.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122101260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Extractive speech summarization by active learning 主动学习提取语音摘要
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373269
J. Zhang, R. Chan, Pascale Fung
{"title":"Extractive speech summarization by active learning","authors":"J. Zhang, R. Chan, Pascale Fung","doi":"10.1109/ASRU.2009.5373269","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373269","url":null,"abstract":"In this paper, we propose an active learning approach for feature-based extractive summarization of lecture speech. Most state-of-the-art speech summarization systems are trained by using a large amount of human reference summaries. Active learning targets to minimize human annotation efforts by automatically selecting a small amount of unlabeled examples for labeling. Our method chooses the unlabeled examples according to a combination of informativeness criterion and robustness criterion. Our summarization results show an increasing learning curve of ROUGE-L F-measure, from 0.44 to 0.54, consistently higher than that of using randomly chosen training samples. We also show that, by following the rhetorical structure in presentation slides, it is possible for humans to produce Ȝgold standardȝ reference summaries with very high inter-labeler agreement.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114606520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Representing the Reinforcement Learning state in a negotiation dialogue 在协商对话中表示强化学习状态
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373413
P. Heeman
{"title":"Representing the Reinforcement Learning state in a negotiation dialogue","authors":"P. Heeman","doi":"10.1109/ASRU.2009.5373413","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373413","url":null,"abstract":"Most applications of Reinforcement Learning (RL) for dialogue have focused on slot-filling tasks. In this paper, we explore a task that requires negotiation, in which conversants need to exchange information in order to decide on a good solution. We investigate what information should be included in the system's RL state so that an optimal policy can be learned and so that the state space stays reasonable in size. We propose keeping track of the decisions that the system has made, and using them to constrain the system's future behavior in the dialogue. In this way, we can compositionally represent the strategy that the system is employing. We show that this approach is able to learn a good policy for the task. This work is a first step to a more general exploration of applying RL to negotiation dialogues.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130189230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Voice-based information retrieval — how far are we from the text-based information retrieval ? 基于语音的信息检索——我们离基于文本的信息检索还有多远?
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5372952
Lin-Shan Lee, Yi-Cheng Pan
{"title":"Voice-based information retrieval — how far are we from the text-based information retrieval ?","authors":"Lin-Shan Lee, Yi-Cheng Pan","doi":"10.1109/ASRU.2009.5372952","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5372952","url":null,"abstract":"Although network content access is primarily text-based today, almost all roles of text can be accomplished by voice. Voice-based information retrieval refers to the situation that the user query and/or the content to be retried are in form of voice. This paper tries to compare the voice-based information retrieval with the currently very successful text-based information retrieval, and identifies two major issues in which voice-based information retrieval is far behind: retrieval accuracy and user-system interaction. These two issues are reviewed, analyzed and discussed in detail. It is found that very good approaches have been proposed and very good improvements have been achieved, although there is still a very long way to go. A few successful prototype systems, among many others are presented at the end.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132006832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
On speeding phoneme recognition in a hierarchical MLP structure 层次MLP结构中加速音素识别的研究
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373278
D. Vásquez, Guillermo Aradilla, R. Gruhn, W. Minker
{"title":"On speeding phoneme recognition in a hierarchical MLP structure","authors":"D. Vásquez, Guillermo Aradilla, R. Gruhn, W. Minker","doi":"10.1109/ASRU.2009.5373278","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373278","url":null,"abstract":"In this paper, we propose a technique for speeding phoneme recognition in a hierarchical structure involving multilayered perceptrons (MLPs). The hierarchical structure consists of two MLP-based layers, where the output of the first layer is used as input for the second layer. In this paper, we efficiently speed up the system by removing the redundant information contained at the output of the first layer. Several techniques are investigated for removing this redundant information based on temporal and phonetic criteria. The best approach reduces the computational time by 57% while keeping a system accuracy comparable to the standard hierarchical approach. This scheme favors the implementation of such hierarchical structures in real-time applications.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123503066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pronunciation modeling for dialectal arabic speech recognition 阿拉伯方言语音识别的发音建模
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373245
Hassan Al-Haj, Roger Hsiao, Ian Lane, A. Black, A. Waibel
{"title":"Pronunciation modeling for dialectal arabic speech recognition","authors":"Hassan Al-Haj, Roger Hsiao, Ian Lane, A. Black, A. Waibel","doi":"10.1109/ASRU.2009.5373245","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373245","url":null,"abstract":"Short vowels in Arabic are normally omitted in written text which leads to ambiguity in the pronunciation. This is even more pronounced for dialectal Arabic where a single word can be pronounced quite differently based on the speaker's nationality, level of education, social class and religion. In this paper we focus on pronunciation modeling for Iraqi-Arabic speech. We introduce multiple pronunciations into the Iraqi speech recognition lexicon, and compare the performance, when weights computed via forced alignment are assigned to the different pronunciations of a word. Incorporating multiple pronunciations improved recognition accuracy compared to a single pronunciation baseline and introducing pronunciation weights further improved performance. Using these techniques an absolute reduction in word-error-rate of 2.4% was obtained compared to the baseline system.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128467145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Sub-structure-based estimation of pronunciation proficiency and classification of learners 基于子结构的语音熟练度评估与学习者分类
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373275
Masayuki Suzuki, N. Minematsu, Dean Luo, K. Hirose
{"title":"Sub-structure-based estimation of pronunciation proficiency and classification of learners","authors":"Masayuki Suzuki, N. Minematsu, Dean Luo, K. Hirose","doi":"10.1109/ASRU.2009.5373275","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373275","url":null,"abstract":"Automatic estimation of pronunciation proficiency has its specific difficulty. Adequacy in controlling the vocal organs can be estimated from spectral envelopes of input utterances but the envelope patterns are also affected easily by different speakers. To develop a pedagogically sound method for automatic estimation, the envelope changes caused by linguistic factors and those by extra-linguistic factors should be properly separated. For this aim, in our previous study [1], we proposed a mathematically-guaranteed and linguistically-valid speaker-invariant representation of pronunciation, called speech structure. After the proposal, we have examined that representation also for ASR [2], [3], [4] and, through these works, we have learned better how to apply speech structures to various tasks. In this paper, we focus on a proficiency estimation experiment done in [1] and, based on our recently proposed techniques for the structures, we carry out that experiment again but under new and different conditions. Here, we use smaller units of structural analysis, speaker-invariant substructures, and relative structural distances between a learner and a teacher. Results show that correlations between human and machine rating are improved and also show extremely higher robustness to speaker differences compared to widely used GOP scores. Further, we also demonstrate that the proposed representation can classify learners purely based on their pronunciation proficiency, not affected by their age and gender.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128725998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
An improved parallel model combination method for noisy speech recognition 一种改进的并行模型组合方法用于噪声语音识别
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373332
H. Veisi, H. Sameti
{"title":"An improved parallel model combination method for noisy speech recognition","authors":"H. Veisi, H. Sameti","doi":"10.1109/ASRU.2009.5373332","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373332","url":null,"abstract":"In this paper a novel method, called PC-PMC, is proposed to improve the performance of automatic speech recognition systems in noisy environments. This method is based on the parallel model combination (PMC) technique and uses the Cepstral Mean Subtraction (CMS) normalization ability and Principal Component Analysis (PCA) compression and de-correlation capabilities. It takes the advantages of both additive noise compensation of PMC and convolutive noise removal ability of CMS and PCA. The first problem to be solved in the realizing of PC-PMC is that PMC algorithm requires invertible modules in the front-end of the system while CMS normalization is not an invertible process. Also, it is required to design a framework for adaptation of the PCA transform in the presence of noise. The method proposed in this paper provides solutions to the both problems. Our evaluations are done on four different real noisy tasks using Nevisa Persian continuous speech recognition system. Experimental results demonstrate significant reduction in word error rate using PC-PMC in comparison with the standard robustness methods.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130933593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信