2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721)最新文献

筛选
英文 中文
Relationship between dialogue acts and hot spots in meetings 对话行为与会议热点的关系
B. Wrede, Elizabeth Shriberg
{"title":"Relationship between dialogue acts and hot spots in meetings","authors":"B. Wrede, Elizabeth Shriberg","doi":"10.1109/ASRU.2003.1318425","DOIUrl":"https://doi.org/10.1109/ASRU.2003.1318425","url":null,"abstract":"We examine the relationship between hot spots (annotated in terms of involvement) and dialogue acts (DAs, annotated in an independent effort) in roughly 32 hours of speech data from naturally-occurring meetings. Results reveal that four independently-motivated involvement categories (non-involved, disagreeing, amused, and other) show statistically significant associations with particular DAs. Further examination shows that involvement is associated with contextual features (such as the speaker or type of meeting), as well as with lexical features (such as utterance length and perplexity). Finally, we found (surprisingly) that perplexities are similar for involved and non-involved utterances. This suggests that it may not be the amount of propositional content, but rather participants' attitudes toward that content, that differentiates hot spots from other regions in a meeting. Overall, these specific correlations, and their relationships to other features, such as perplexity, could provide useful information for the automatic archiving and browsing of natural meetings.","PeriodicalId":394174,"journal":{"name":"2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127143009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 64
A noise-robust ASR front-end using Wiener filter constructed from MMSE estimation of clean speech and noise 基于噪声和干净语音的MMSE估计构建了一种基于维纳滤波器的抗噪ASR前端
Jian Wu, J. Droppo, L. Deng, A. Acero
{"title":"A noise-robust ASR front-end using Wiener filter constructed from MMSE estimation of clean speech and noise","authors":"Jian Wu, J. Droppo, L. Deng, A. Acero","doi":"10.1109/ASRU.2003.1318461","DOIUrl":"https://doi.org/10.1109/ASRU.2003.1318461","url":null,"abstract":"In this paper, we present a novel two-stage framework for designing a noise-robust front-end for automatic speech recognition. In the first stage, a parametric model of acoustic distortion is used to estimate the clean speech and noise spectra in a principled way so that no heuristic parameters need to be set manually. To reduce possible flaws caused by the simplifying assumptions in the parametric model, a second-stage Wiener filtering is applied to further reduce the noise while preserving speech spectra unharmed. This front-end is evaluated on the Aurora2 task. For the multi-condition training scenario, a relative error reduction of 28.4% is achieved.","PeriodicalId":394174,"journal":{"name":"2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122475451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Balancing data-driven and rule-based approaches in the context of a Multimodal Conversational System 在多模态会话系统中平衡数据驱动和基于规则的方法
S. Bangalore, Michael Johnston
{"title":"Balancing data-driven and rule-based approaches in the context of a Multimodal Conversational System","authors":"S. Bangalore, Michael Johnston","doi":"10.1109/ASRU.2003.1318444","DOIUrl":"https://doi.org/10.1109/ASRU.2003.1318444","url":null,"abstract":"We address the issue of combining data-driven and grammar-based models for rapid prototyping of a multimodal conversational system. Moderate-sized rule-based spoken language models for recognition and understanding are easy to develop and provide the ability to prototype conversational applications rapidly. However, scalability of such systems is a bottleneck due to the heavy cost of authoring and maintenance of rule sets and inevitable brittleness due to lack of coverage in the rule sets. In contrast, data-driven approaches are robust and the procedure for model building is usually simple. However, the lack of data in an application context limits the ability to build data-driven models, especially in multimodal systems. We also present methods that reuse data from different domains and investigate the limits of such models in the context of an application domain.","PeriodicalId":394174,"journal":{"name":"2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129567463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
In search of optimal data selection for training of automatic speech recognition systems 为自动语音识别系统的训练寻找最优的数据选择
A. Nagroski, L. Boves, H. Steeneken
{"title":"In search of optimal data selection for training of automatic speech recognition systems","authors":"A. Nagroski, L. Boves, H. Steeneken","doi":"10.1109/ASRU.2003.1318405","DOIUrl":"https://doi.org/10.1109/ASRU.2003.1318405","url":null,"abstract":"This paper presents an extended study in the topic of optimal selection of speech data from a database for efficient training of ASR systems. We reconsider a method of optimal selection introduced in our previous work and introduce variosearch as an alternative selection method developed in order to find a representative sample of speech data with a simultaneous control of acoustical and statistical parameters of data selected. Next, we present experiments in which the performance of a standard ASR system trained with data sets selected from a Dutch digits database via different selection methods was compared. The results show that the length of training utterances has a dominant impact on the recognition performance. Therefore, the length of the utterances is a factor that must be taken into account when interpreting phoneme recognition scores.","PeriodicalId":394174,"journal":{"name":"2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128578190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Data collection and evaluation of AURORA-2 Japanese corpus [speech recognition applications] AURORA-2日语语料库的数据采集与评价[语音识别应用]
Satoshi Nakamura, Kazumasa Yamamoto, K. Takeda, S. Kuroiwa, N. Kitaoka, Takeshi Yamada, M. Mizumachi, T. Nishiura, M. Fujimoto, A. Saso, Toshiki Endo
{"title":"Data collection and evaluation of AURORA-2 Japanese corpus [speech recognition applications]","authors":"Satoshi Nakamura, Kazumasa Yamamoto, K. Takeda, S. Kuroiwa, N. Kitaoka, Takeshi Yamada, M. Mizumachi, T. Nishiura, M. Fujimoto, A. Saso, Toshiki Endo","doi":"10.1109/ASRU.2003.1318511","DOIUrl":"https://doi.org/10.1109/ASRU.2003.1318511","url":null,"abstract":"Speech recognition systems must still be improved when they are exposed to noisy environments. For this improvement, developments of the standard evaluation corpus and assessment technologies are essential. Recently, the AURORA-2,3 corpus and their evaluation scenarios have had significant impact on noisy speech recognition research. This paper introduces a Japanese noisy speech corpus and its evaluation scripts, called AURORA-2J The AURORA-2J is a Japanese connected digits corpus. The data collection and evaluation scenarios are designed in the same way as AURORA-2 with the help of the ETSI AURORA group. Furthermore, we have collected an in-car speech corpus similar to AURORA-3. The in-car speech corpus includes Japanese connected digits and command words collected in a moving car. This paper describes the data collection, baseline scripts, and its baseline performance.","PeriodicalId":394174,"journal":{"name":"2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123901523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Interactive grammar inference with finite state transducers 有限状态换能器的交互语法推理
S. Caskey, Ezra Story, R. Pieraccini
{"title":"Interactive grammar inference with finite state transducers","authors":"S. Caskey, Ezra Story, R. Pieraccini","doi":"10.1109/ASRU.2003.1318503","DOIUrl":"https://doi.org/10.1109/ASRU.2003.1318503","url":null,"abstract":"We propose a method for improving the coverage of handcrafted context free grammars based on a set of new sentence examples. The described algorithm aims at finding the minimal set of modifications to the grammar that increase its coverage while preserving its original structure. The algorithm is based on a finite state transducer (FST) representation of context free grammars. The inference method includes an interactive component that allows developers to control the generalization of the new grammar.","PeriodicalId":394174,"journal":{"name":"2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128049685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Pronunciation modeling for names of foreign origin 外来名的发音建模
B. Maison, S.F. Chen, P. S. Cohen
{"title":"Pronunciation modeling for names of foreign origin","authors":"B. Maison, S.F. Chen, P. S. Cohen","doi":"10.1109/ASRU.2003.1318479","DOIUrl":"https://doi.org/10.1109/ASRU.2003.1318479","url":null,"abstract":"The pronunciation of a proper name is influenced by both a speaker's native language as well as the language of origin of the name itself. Thus, creating suitable sets of pronunciations for names in speech recognition applications is extremely challenging. We investigate whether automatic language identification and grapheme-to-phoneme conversion algorithms can be effective for this task. We train grapheme-to-phoneme models for eight foreign languages and use automatic language identification to select the models with which to generate additional pronunciations for words in a baseline pronunciation dictionary. As compared to the baseline dictionary in a US name recognition task, we achieve a 25% reduction in sentence-error rate for foreign names spoken by native speakers of the language in question, and a 10% reduction in sentence-error rate for foreign names spoken by American speakers.","PeriodicalId":394174,"journal":{"name":"2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130435564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Improved language model adaptation using existing and derived external resources 使用现有和派生的外部资源改进语言模型适应
Pi-Chuan Chang, Lin-Shan Lee
{"title":"Improved language model adaptation using existing and derived external resources","authors":"Pi-Chuan Chang, Lin-Shan Lee","doi":"10.1109/ASRU.2003.1318496","DOIUrl":"https://doi.org/10.1109/ASRU.2003.1318496","url":null,"abstract":"The adaptation of language models to obtain better parameters for the topics addressed by the spoken documents to be recognized has been a key issue for speech recognition. In this paper, we propose to collect existing as well as derived external resources for improved language model adaptation. The derived external resources are those retrieved, based on the baseline transcriptions for the input spoken documents, from the Internet using a search engine. The design of queries for such purposes is also analyzed in this paper, in which the special structure of the Chinese language is considered. The obtained existing and derived external resources are then used in the model adaptation, under a clustering-classification framework. Very encouraging results were obtained in the preliminary experiments with two test sets: broadcast news and interview recording.","PeriodicalId":394174,"journal":{"name":"2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134123951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Belief confirmation in spoken dialog systems using confidence measures 在口语对话系统中使用信心措施来确认信念
C. Raymond, Y. Estève, F. Béchet, R. de Mori, Géraldine Damnati
{"title":"Belief confirmation in spoken dialog systems using confidence measures","authors":"C. Raymond, Y. Estève, F. Béchet, R. de Mori, Géraldine Damnati","doi":"10.1109/ASRU.2003.1318420","DOIUrl":"https://doi.org/10.1109/ASRU.2003.1318420","url":null,"abstract":"The approach proposed is an alternative to the traditional architecture of spoken dialogue systems where the system belief is either not taken into account during the automatic speech recognition process or included in the decoding process but never challenged. By representing all the conceptual structures handled by the dialogue manager by finite state machines and by building a conceptual model that contains all the possible interpretations of a given word-graph, we propose a decoding architecture that searches first for the best conceptual interpretation before looking for the best string of words. Once both N-best sets (at the concept level and at the word level) are generated, a verification process is performed on each N-best set using acoustic and linguistic confidence measures. A first selection strategy that does not include for the moment the dialogue context is proposed and significant error reduction on the understanding measures are obtained.","PeriodicalId":394174,"journal":{"name":"2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134620807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
VTLN-based cross-language voice conversion 基于虚拟磁带库的跨语言语音转换
D. Sündermann, H. Ney, H. Höge
{"title":"VTLN-based cross-language voice conversion","authors":"D. Sündermann, H. Ney, H. Höge","doi":"10.1109/ASRU.2003.1318521","DOIUrl":"https://doi.org/10.1109/ASRU.2003.1318521","url":null,"abstract":"In speech recognition, vocal tract length normalization (VTLN) is a well-studied technique for speaker normalization. As cross-language voice conversion aims at the transformation of a source speaker's voice into that of a target speaker using a different language, we want to investigate whether VTLN is an appropriate method to adapt the voice characteristics. After applying several conventional VTLN warping functions, we extend the conventional piece-wise linear function to several segments, allowing a more detailed warping of the source spectrum. Experiments on cross-language voice conversion are performed on three corpora of two languages and both speaker genders.","PeriodicalId":394174,"journal":{"name":"2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117146414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 98
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信