IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.最新文献

筛选
英文 中文
Speech data retrieval system constructed on a universal phonetic code domain 基于通用语音码域的语音数据检索系统
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034652
K. Tanaka, Y. Itoh, H. Kojima, Nahoko Fujimura
{"title":"Speech data retrieval system constructed on a universal phonetic code domain","authors":"K. Tanaka, Y. Itoh, H. Kojima, Nahoko Fujimura","doi":"10.1109/ASRU.2001.1034652","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034652","url":null,"abstract":"We propose a novel speech processing framework, where all of the speech data are encoded into universal phonetic code (UPC) sequences and speech processing systems, such as speech recognition, retrieval, digesting, etc., are constructed on this UPC domain. As the first step, we introduce a sub-phonetic segment (SPS) set, based on IPA (international phonetic alphabet), to deal with multilingual speech and develop a procedure to estimate acoustic models of the SPS from IPA-like phone models. The key point of the framework is to employ environment adaptation into the SPS encoding stage. This makes it possible to normalize acoustic variations and extract the language factor contained in speech signals as encoded SPS sequences. We confirm these characteristics by constructing a speech retrieval system on the SPS domain. The system can retrieve key phrases, given by speech, from different environment speech data in a vocabulary-free condition. We show several preliminary experimental results on this system, using Japanese and English sentence speech sets.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129965062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Ubiquitous speech communication interface 无处不在的语音通信接口
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034595
B. Juang
{"title":"Ubiquitous speech communication interface","authors":"B. Juang","doi":"10.1109/ASRU.2001.1034595","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034595","url":null,"abstract":"The Holy Grail of telecommunication is to bring people thousands miles apart, anytime, anywhere, together to communicate as if they were having a face-to-face conversation in a ubiquitous telepresence scenario. One key component necessary to reach this Holy Grail is the technology that supports hands-free speech communication. Hands-free telecommunication (both telephony and teleconferencing) refers to a communication mode in which the participants interact with each other over a communication network, without having to wear or hold any special device. For speech communications, we normally need a loudspeaker, a microphone or a headset. The goal of hands-free speech communication is thus to provide the users with an intelligent voice interface, which provides high quality communication and is safe, convenient, and natural to use. This goal stipulates many challenging technical issues, such as multiple sound sources, echo and reverberation in the room, and natural human-machine interaction, the resolution of which needs to be integrated into a working system before the benefit of hands-free telecommunication can be realized. We analyze these issues and review progress made in the last two decades, particularly from the viewpoint of signal acquisition, restoration and enhancement. We lay out new technical dimensions that may lead to further advances towards realization of a truly ubiquitous speech communication interface to an intelligent information source, be it a human or a machine.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130249812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Dynamic sharings of Gaussian densities using phonetic features 使用语音特征的高斯密度动态共享
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034675
Kyung-Tak Lee, C. Wellekens
{"title":"Dynamic sharings of Gaussian densities using phonetic features","authors":"Kyung-Tak Lee, C. Wellekens","doi":"10.1109/ASRU.2001.1034675","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034675","url":null,"abstract":"This paper describes a way to adapt the recognizer to pronunciation variability by dynamically sharing Gaussian densities across phonetic models. The method is divided in three steps. First, given an input utterance, an HMM recognizer outputs a lattice of the most likely word hypotheses. Then, the canonical pronunciation of each hypothesis is checked by comparing its theoretical phonetic features to those automatically extracted from speech. If the comparisons show that a phoneme of an hypothesis has likely been pronounced differently, its model is transformed by sharing its Gaussian densities with the ones of its possible alternate phone realization(s). Finally, the transformed models are used in a second-pass recognition. Sharings are dynamic because they are automatically adapted to each input speech. Experiments showed a 5.4% relative reduction in word error rate compared to the baseline and a 2.7% compared to a static method.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132905929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Continuous multi-band speech recognition using Bayesian networks 基于贝叶斯网络的连续多波段语音识别
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034584
K. Daoudi, D. Fohr, Christophe Antoine
{"title":"Continuous multi-band speech recognition using Bayesian networks","authors":"K. Daoudi, D. Fohr, Christophe Antoine","doi":"10.1109/ASRU.2001.1034584","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034584","url":null,"abstract":"Using the Bayesian networks framework, we present a new multi-band approach for continuous speech recognition. This new approach has the advantage of overcoming all the limitations of the standard multi-band techniques. Moreover, it leads to a higher fidelity speech modeling than HMMs. We provide a preliminary evaluation of the performance of our new approach on a connected digits recognition task.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114461172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Bridging the gap between mixed-initiative dialogs and reusable sub-dialogs 弥合混合主动性对话框和可重用子对话框之间的差距
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034641
S. Kronenberg, P. Regel-Brietzman
{"title":"Bridging the gap between mixed-initiative dialogs and reusable sub-dialogs","authors":"S. Kronenberg, P. Regel-Brietzman","doi":"10.1109/ASRU.2001.1034641","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034641","url":null,"abstract":"For easing the development process for dialog systems it is desired that reusable dialog components provide pre-packaged functionality 'out-of-the-box' that enables developers to quickly build applications by providing standard default settings and behavior. Additionally, human-computer interaction should become more human-like in that mixed-initiative dialogs are supported. Mixed-initiative interaction requires the system to react to user initiated application specific commands whereby reusable dialog components have to be application independent to be used in different settings. This article presents a dialog mechanism, so called meta-dialog, which is responsible for the control flow between reusable sub-dialogs and mixed-initiative dialogs.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123534773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multispeaker speech activity detection for the ICSI meeting recorder 用于ICSI会议记录器的多扬声器语音活动检测
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034599
T. Pfau, Daniel P. W. Ellis, Andreas Stolcke
{"title":"Multispeaker speech activity detection for the ICSI meeting recorder","authors":"T. Pfau, Daniel P. W. Ellis, Andreas Stolcke","doi":"10.1109/ASRU.2001.1034599","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034599","url":null,"abstract":"As part of a project into speech recognition in meeting environments, we have collected a corpus of multichannel meeting recordings. We expected the identification of speaker activity to be straightforward given that the participants had individual microphones, but simple approaches yielded unacceptably erroneous labelings, mainly due to crosstalk between nearby speakers and wide variations in channel characteristics. Therefore, we have developed a more sophisticated approach for multichannel speech activity detection using a simple hidden Markov model (HMM). A baseline HMM speech activity detector has been extended to use mixtures of Gaussians to achieve robustness for different speakers under different conditions. Feature normalization and crosscorrelation processing are used to increase the channel independence and to detect crosstalk. The use of both energy normalization and crosscorrelation based postprocessing results in a 35% relative reduction of the frame error rate. Speech recognition experiments show that it is beneficial in this multispeaker setting to use the output of the speech activity detector for presegmenting the recognizer input, achieving word error rates within 10% of those achieved with manual turn labeling.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"13 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120836563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 114
Out-of-vocabulary word modeling using multiple lexical fillers 使用多个词汇填充符的词汇外单词建模
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034628
Gilles Boulianne, P. Dumouchel
{"title":"Out-of-vocabulary word modeling using multiple lexical fillers","authors":"Gilles Boulianne, P. Dumouchel","doi":"10.1109/ASRU.2001.1034628","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034628","url":null,"abstract":"In large vocabulary speech recognition, out-of-vocabulary words are an important cause of errors. We describe a lexical filler model that can be used in a single pass recognition system to detect out-of-vocabulary words and reduce the error rate. When rescoring word graphs with better acoustic models, word fillers cause a combinatorial explosion. We introduce a new technique, using several thousand lexical fillers, which produces word graphs that can be rescored efficiently. On a large French vocabulary continuous speech recognition task, lexical fillers achieved an OOV detection rate of 44% and allowed a 23% reduction in errors due to OOV words.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128927600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Automatic selection of transcribed training material 自动选择转录培训材料
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034673
T. Kamm, Gerald G. Meyer
{"title":"Automatic selection of transcribed training material","authors":"T. Kamm, Gerald G. Meyer","doi":"10.1109/ASRU.2001.1034673","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034673","url":null,"abstract":"Conventional wisdom says that incorporating more training data is the surest way to reduce the error rate of a speech recognition system. This, in turn, guarantees that speech recognition systems are expensive to train, because of the high cost of annotating training data. We propose an iterative training algorithm that seeks to improve the error rate of a speech recognizer without incurring additional transcription cost, by selecting a subset of the already available transcribed training data. We apply the proposed algorithm to an alpha-digit recognition problem and reduce the error rate from 10.3% to 9.4% on a particular test set.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129054245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Grammar learning for spoken language understanding 口语理解的语法学习
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034645
Ye-Yi Wang, A. Acero
{"title":"Grammar learning for spoken language understanding","authors":"Ye-Yi Wang, A. Acero","doi":"10.1109/ASRU.2001.1034645","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034645","url":null,"abstract":"Many state-of-the-art conversational systems use semantic-based robust understanding and manually derived grammars, a very time-consuming and error-prone process. This paper describes a machine-aided grammar authoring system that enables a programmer to develop rapidly a high quality grammar for conversational systems. This is achieved with a combination of domain-specific semantics, a library grammar, syntactic constraints and a small number of example sentences that have been semantically annotated. Our experiments show that the learned semantic grammars consistently outperform manually authored grammars, requiring much less authoring load.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131321451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
The symbiosis of DSP and speech recognition or an outsider's view of the inside DSP与语音识别的共生还是一个局外人的观点
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034575
J. Kaiser
{"title":"The symbiosis of DSP and speech recognition or an outsider's view of the inside","authors":"J. Kaiser","doi":"10.1109/ASRU.2001.1034575","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034575","url":null,"abstract":"From an historical review of how we got to where we are now, we discuss the interrelationship between our system design objectives and goals, our modeling of the speech signal and its generation and parameterization, and the broadly developing DSP methodology. We take a critical look at some of the underlying assumptions in. our modeling to see if they may be limiting the performance that can be obtained with ASR (automatic speech recognition) systems. We close with some open questions and challenges for new work.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126830859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信