2009 IEEE Workshop on Automatic Speech Recognition & Understanding最新文献

筛选
英文 中文
Correlation-based query relaxation for example-based dialog modeling 基于关联的查询松弛,用于基于示例的对话建模
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373242
Cheongjae Lee, Sungjin Lee, Sangkeun Jung, Kyungduk Kim, Donghyeon Lee, G. G. Lee
{"title":"Correlation-based query relaxation for example-based dialog modeling","authors":"Cheongjae Lee, Sungjin Lee, Sangkeun Jung, Kyungduk Kim, Donghyeon Lee, G. G. Lee","doi":"10.1109/ASRU.2009.5373242","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373242","url":null,"abstract":"Query relaxation refers to the process of reducing the number of constraints on a query if it returns no result when searching a database. This is an important process to enable extraction of an appropriate number of query results because queries that are too strictly constrained may return no result, whereas queries that are too loosely constrained may return too many results. This paper proposes an automated method of correlation-based query relaxation (CBQR) to select an appropriate constraint subset. The example-based dialog modeling framework was used to validate our algorithm. Preliminary results show that the proposed method facilitates the automation of query relaxation. We believe that the CBQR algorithm effectively relaxes constraints on failed queries to return more dialog examples.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114983670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Large-margin feature adaptation for automatic speech recognition 自动语音识别的大距特征自适应
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373320
Chih-Chieh Cheng, Fei Sha, L. Saul
{"title":"Large-margin feature adaptation for automatic speech recognition","authors":"Chih-Chieh Cheng, Fei Sha, L. Saul","doi":"10.1109/ASRU.2009.5373320","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373320","url":null,"abstract":"We consider how to optimize the acoustic features used by hidden Markov models (HMMs) for automatic speech recognition (ASR). We investigate a mistake-driven algorithm that discriminatively reweights the acoustic features in order to separate the log-likelihoods of correct and incorrect transcriptions by a large margin. The algorithm simultaneously optimizes the HMM parameters in the back end by adapting them to the reweighted features computed by the front end. Using an online approach, we incrementally update feature weights and model parameters after the decoding of each training utterance. To mitigate the strongly biased gradients from individual training utterances, we train several different recognizers in parallel while tying the feature transformations in their front ends. We show that this parameter-tying across different recognizers leads to more stable updates and generally fewer recognition errors.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116933556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Integrating prosodic features in extractive meeting summarization 结合节选会议摘要的韵律特征
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373302
Shasha Xie, Dilek Z. Hakkani-Tür, Benoit Favre, Yang Liu
{"title":"Integrating prosodic features in extractive meeting summarization","authors":"Shasha Xie, Dilek Z. Hakkani-Tür, Benoit Favre, Yang Liu","doi":"10.1109/ASRU.2009.5373302","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373302","url":null,"abstract":"Speech contains additional information than text that can be valuable for automatic speech summarization. In this paper, we evaluate how to effectively use acoustic/prosodic features for extractive meeting summarization, and how to integrate prosodic features with lexical and structural information for further improvement. To properly represent prosodic features, we propose different normalization methods based on speaker, topic, or local context information. Our experimental results show that using only the prosodic features we achieve better performance than using the non-prosodic information on both the human transcripts and recognition output. In addition, a decision-level combination of the prosodic and non-prosodic features yields further gain, outperforming the individual models.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129787891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
Multilingual speaker age recognition: Regression analyses on the Lwazi corpus 多语说话者年龄识别:Lwazi语料库的回归分析
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373374
M. Feld, E. Barnard, C. V. Heerden, Christian A. Müller
{"title":"Multilingual speaker age recognition: Regression analyses on the Lwazi corpus","authors":"M. Feld, E. Barnard, C. V. Heerden, Christian A. Müller","doi":"10.1109/ASRU.2009.5373374","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373374","url":null,"abstract":"Multilinguality represents an area of significant opportunities for automatic speech-processing systems: whereas multilingual societies are commonplace, the majority of speech-processing systems are developed with a single language in mind. As a step towards improved understanding of multilingual speech processing, the current contribution investigates how an important para-linguistic aspect of speech, namely speaker age, depends on the language spoken. In particular, we study how certain speech features affect the performance of an age recognition system for different South African languages in the Lwazi corpus. By optimizing our feature set and performing language-specific tuning, we are working towards true multilingual classifiers. As they are closely related, ASR and dialog systems are likely to benefit from an improved classification of the speaker. In a comprehensive corpus analysis on long-term features, we have identified features that exhibit characteristic behaviors for particular languages. In a follow-up regression experiment, we confirm the suitability of our feature selection for age recognition and present cross-language error rates. The mean absolute error ranges between 7.7 and 12.8 years for same-language predictors and rises to 14.5 years for cross-language predictors.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128285934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Leveraging speech production knowledge for improved speech recognition 利用语音生产知识来改进语音识别
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373368
A. Sangwan, J. Hansen
{"title":"Leveraging speech production knowledge for improved speech recognition","authors":"A. Sangwan, J. Hansen","doi":"10.1109/ASRU.2009.5373368","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373368","url":null,"abstract":"This study presents a novel phonological methodology for speech recognition based on phonological features (PFs) which leverages the relationship between speech phonology and phonetics. In particular, the proposed scheme estimates the likelihood of observing speech phonology given an associative lexicon. In this manner, the scheme is capable of choosing the most likely hypothesis (word candidate) among a group of competing alternative hypotheses. The framework employs the Maximum Entropy (ME) model to learn the relationship between phonetics and phonology. Subsequently, we extend the ME model to a ME-HMM (maximum entropy-hidden Markov model) which captures the speech production and linguistic relationship between phonology and words. The proposed ME-HMM model is applied to the task of re-processing N-best lists where an absolute WRA (word recognition rate) increase of 1.7%, 1.9% and 1% are reported for TIMIT, NTIMIT, and the SPINE (speech in noise) corpora (15.5% and 22.5% relative reduction in word error rate for TIMIT and NTIMIT).","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130450010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The exploration/exploitation trade-off in Reinforcement Learning for dialogue management 对话管理中强化学习的探索/利用权衡
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373260
S. Varges, G. Riccardi, S. Quarteroni, A. Ivanov
{"title":"The exploration/exploitation trade-off in Reinforcement Learning for dialogue management","authors":"S. Varges, G. Riccardi, S. Quarteroni, A. Ivanov","doi":"10.1109/ASRU.2009.5373260","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373260","url":null,"abstract":"Conversational systems use deterministic rules that trigger actions such as requests for confirmation or clarification. More recently, Reinforcement Learning and (Partially Observable) Markov Decision Processes have been proposed for this task. In this paper, we investigate action selection strategies for dialogue management, in particular the exploration/exploitation trade-off and its impact on final reward (i.e. the session reward after optimization has ended) and lifetime reward (i.e. the overall reward accumulated over the learner's lifetime). We propose to use interleaved exploitation sessions as a learning methodology to assess the reward obtained from the current policy. The experiments show a statistically significant difference in final reward of exploitation-only sessions between a system that optimizes lifetime reward and one that maximizes the reward of the final policy.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125354298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
From speech to letters - using a novel neural network architecture for grapheme based ASR 从语音到字母——使用一种新颖的神经网络架构进行基于字素的ASR
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373257
F. Eyben, M. Wöllmer, Björn Schuller, Alex Graves
{"title":"From speech to letters - using a novel neural network architecture for grapheme based ASR","authors":"F. Eyben, M. Wöllmer, Björn Schuller, Alex Graves","doi":"10.1109/ASRU.2009.5373257","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373257","url":null,"abstract":"Main-stream automatic speech recognition systems are based on modelling acoustic sub-word units such as phonemes. Phonemisation dictionaries and language model based decoding techniques are applied to transform the phoneme hypothesis into orthographic transcriptions. Direct modelling of graphemes as sub-word units using HMM has not been successful. We investigate a novel ASR approach using Bidirectional Long Short-Term Memory Recurrent Neural Networks and Connectionist Temporal Classification, which is capable of transcribing graphemes directly and yields results highly competitive with phoneme transcription. In design of such a grapheme based speech recognition system phonemisation dictionaries are no longer required. All that is needed is text transcribed on the sentence level, which greatly simplifies the training procedure. The novel approach is evaluated extensively on the Wall Street Journal 1 corpus.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121068028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Ontology-based grounding of Spoken Language Understanding 基于本体的口语理解基础
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373500
S. Quarteroni, Marco Dinarelli, G. Riccardi
{"title":"Ontology-based grounding of Spoken Language Understanding","authors":"S. Quarteroni, Marco Dinarelli, G. Riccardi","doi":"10.1109/ASRU.2009.5373500","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373500","url":null,"abstract":"Current Spoken Language Understanding models rely on either hand-written semantic grammars or flat attribute-value sequence labeling. In most cases, no relations between concepts are modeled, and both concepts and relations are domain-specific, making it difficult to expand or port the domain model. In contrast, we expand our previous work on a domain model based on an ontology where concepts follow the predicate-argument semantics and domain-independent classical relations are defined on such concepts. We conduct a thorough study on a spoken dialog corpus collected within a customer care problem-solving domain, and we evaluate the coverage and impact of the ontology for the interpretation, grounding and re-ranking of spoken language understanding interpretations.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121052310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Towards integrated machine translation using structural alignment from syntax-augmented synchronous parsing 从语法增强同步解析到使用结构对齐的集成机器翻译
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5372892
Bing Xiang, Bowen Zhou, Martin Cmejrek
{"title":"Towards integrated machine translation using structural alignment from syntax-augmented synchronous parsing","authors":"Bing Xiang, Bowen Zhou, Martin Cmejrek","doi":"10.1109/ASRU.2009.5372892","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5372892","url":null,"abstract":"In current statistical machine translation, IBM model based word alignment is widely used as a starting point to build phrase-based machine translation systems. However, such alignment model is separated from the rest of machine translation pipeline and optimized independently. Furthermore, structural information is not taken into account in the alignment model, which sometimes leads to incorrect alignments. In this paper, we present a novel method to connect a re-alignment model with a translation model in an integrated framework. We conduct bilingual chart parsing based on syntax-augmented synchronous context-free grammar. A Viterbi derivation tree is generated for each sentence pair with multiple features employed in a log-linear model. A new word alignment is created under the structural constraint from the Viterbi tree. Extensive experiments are conducted in a Farsi-to-English translation task in conversational speech domain and also a German-to-English translation task in text domain. Systems trained on the new alignment provide significant higher BLEU scores compared to a state-of-the-art baseline.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132745338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Response timing generation and response type selection for a spontaneous spoken dialog system 自发语音对话系统的响应时间生成和响应类型选择
2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5372898
Ryota Nishimura, S. Nakagawa
{"title":"Response timing generation and response type selection for a spontaneous spoken dialog system","authors":"Ryota Nishimura, S. Nakagawa","doi":"10.1109/ASRU.2009.5372898","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5372898","url":null,"abstract":"If a dialog system can respond to a user as naturally as a human, the interaction will appear smoother. In this research, we aim to develop a dialog system that emulates human behavior in a chat-like dialog. The proposed system makes use of a decision tree to generate chat-like responses at the appropriate times. These responses include “aizuchi” (back-channel), “repetition”, “collaborative completion”, etc. The system also reacts robustly to the user's overlapping utterances (barge-in) and disfluencies. The subjective evaluation shows that there is a high degree of naturalness in the timing of ordinary responses, overlap, and aizuchi, and that the dialog system exhibits user-friendly behavior. The recorded voices system was preferred, and almost all subjects felt familiarity with aizuchi, and the barge-in was also useful.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114281868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信