IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.最新文献

筛选
英文 中文
Recognition experiments with the SpeechDat-Car Aurora Spanish database using 8 kHz- and 16 kHz-sampled signals 使用8千赫和16千赫采样信号的speech - dat - car Aurora西班牙数据库进行识别实验
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034606
C. Nadeu, M. Tolos
{"title":"Recognition experiments with the SpeechDat-Car Aurora Spanish database using 8 kHz- and 16 kHz-sampled signals","authors":"C. Nadeu, M. Tolos","doi":"10.1109/ASRU.2001.1034606","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034606","url":null,"abstract":"Like the other SpeechDat-Car databases, the Spanish one has been collected using a 16 kHz sampling frequency, and several microphone positions and environmental noises. We aim at clarifying whether there is any advantage in terms of recognition performance from processing the 16 kHz-sampled signals instead of the usual 8 kHz-sampled ones. Recognition tests have been carried out within the Aurora experimental framework, which includes signals from both a close-talking microphone and a distant microphone. Our preliminary results indicate that it is possible to get a performance improvement from the increased bandwidth in the noisy car environment.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116862653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Collaborative steering of microphone array and video camera toward multi-lingual tele-conference through speech-to-speech translation 通过语音到语音的翻译,将麦克风阵列和摄像机协同导向多语种电话会议
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034602
T. Nishiura, R. Gruhn, S. Nakamura
{"title":"Collaborative steering of microphone array and video camera toward multi-lingual tele-conference through speech-to-speech translation","authors":"T. Nishiura, R. Gruhn, S. Nakamura","doi":"10.1109/ASRU.2001.1034602","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034602","url":null,"abstract":"It is very important for multilingual teleconferencing through speech-to-speech translation to capture distant-talking speech with high quality. In addition, the speaker image is also needed to realize a natural communication in such a conference. A microphone array is an ideal candidate for capturing distant-talking speech. Uttered speech can be enhanced and speaker images can be captured by steering a microphone array and a video camera in the speaker direction. However, to realize automatic steering, it is necessary to localize the talker. To overcome this problem, we propose collaborative steering of the microphone array and the video camera in real-time for a multilingual teleconference through speech-to-speech translation. We conducted experiments in a real room environment. The speaker localization rate (i.e., speaker image capturing rate) was 97.7%, speech recognition rate was 90.0%, and TOEIC score was 530/spl sim/540 points, subject to locating the speaker at a 2.0 meter distance from the microphone array.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115515097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Transducer composition for "on-the-fly" lexicon and language model integration 用于“即时”词汇和语言模型集成的换能器组成
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034667
D. Caseiro, I. Trancoso
{"title":"Transducer composition for \"on-the-fly\" lexicon and language model integration","authors":"D. Caseiro, I. Trancoso","doi":"10.1109/ASRU.2001.1034667","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034667","url":null,"abstract":"We present the use of a specialized composition algorithm that allows the generation of a determinized search network for ASR in a single step. The algorithm is exact in the sense that the result is determinized when the lexicon and the language model are represented as determinized transducers. The composition and determinization are performed simultaneously, which is of great importance for \"on-the-fly\" operation. The algorithm pushes the language model weights towards the initial state of the network. Our results show that it is advantageous to use the maximum amount of information as early as possible in the decoding procedure.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123644859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Searching for the missing piece [speech recognition] 寻找丢失的碎片[语音识别]
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034629
W. N. Choi, Y. W. Wong, T. Lee, P. Ching
{"title":"Searching for the missing piece [speech recognition]","authors":"W. N. Choi, Y. W. Wong, T. Lee, P. Ching","doi":"10.1109/ASRU.2001.1034629","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034629","url":null,"abstract":"The tree-trellis forward-backward algorithm has been widely used for N-best searching in continuous speech recognition. In conventional approaches, the heuristic score used for the A* backward search is derived from the partial-path scores recorded during the forward pass. The inherently delayed use of a language model in the lexical tree structure leads to inefficient pruning and the partial-path score recorded is an underestimated heuristic score. This paper presents a novel method of computing the heuristic score that is more accurate than the partial-path score. The goal is to recover high-score sentence hypotheses that may have been pruned halfway during the forward search due to the delayed use of the LM. For the application of Hong Kong stock information inquiries, the proposed technique shows a noticeable performance improvement. In particular, a relative error-rate reduction of 12% has been achieved for top-1 sentences.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129093692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simultaneous recognition of distant talking speech of multiple sound sources based on 3-D N-best search algorithm 基于三维n -最优搜索算法的远距离多声源语音同时识别
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034600
P. Heracleous, S. Nakamura, K. Shikano
{"title":"Simultaneous recognition of distant talking speech of multiple sound sources based on 3-D N-best search algorithm","authors":"P. Heracleous, S. Nakamura, K. Shikano","doi":"10.1109/ASRU.2001.1034600","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034600","url":null,"abstract":"This paper deals with the simultaneous recognition of distant-talking speech of multiple talkers using the 3D N-best search algorithm. We describe the basic idea of the 3D N-best search and we address two additional techniques implemented into the baseline system. Namely, a path distance-based clustering and a likelihood normalization technique appeared to be necessary in order to build an efficient system for our purpose. In previous works we introduced the results of experiments carried out on simulated data. In this paper we introduce the results of the experiments carried out using reverberated data. The reverberated data are those simulated by the image method and recorded in a real room. The image method was used to find out the accuracy-reverberation time relationship, and the real data was used to evaluate the real performance of our algorithm. The obtained Top 3 results of the simultaneous word accuracy was 73.02% under 162 ms reverberation time and using the image method.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128772959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Acoustic analysis and recognition of whispered speech 低声语音的声学分析与识别
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034676
Taisuke Itoh, K. Takeda, F. Itakura
{"title":"Acoustic analysis and recognition of whispered speech","authors":"Taisuke Itoh, K. Takeda, F. Itakura","doi":"10.1109/ASRU.2001.1034676","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034676","url":null,"abstract":"The acoustic properties and a recognition method of whispered speech are discussed. A whispered speech database that consists of whispered speech, normal speech and the corresponding facial video images of more than 6,000 sentences from 100 speakers was prepared. The comparison between whispered and normal utterances show that: 1) the cepstrum distance between them is 4 dB for voiced and 2 dB for unvoiced phonemes; 2) the spectral tilt of whispered speech is less sloped than for normal speech; 3) the frequency of the lower formants (below 1.5 kHz) is lower than that of normal speech. Acoustic models (HMM) trained by the whispered speech database attain an accuracy of 60% in syllable recognition experiments. This accuracy can be improved to 63% when MLLR (maximum likelihood linear regression) adaptation is applied, while the normal speech HMMs adapted with whispered speech attain only 56% syllable accuracy.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116218891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Robust speaker clustering in eigenspace 特征空间中的鲁棒说话人聚类
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034588
R. Faltlhauser, G. Ruske
{"title":"Robust speaker clustering in eigenspace","authors":"R. Faltlhauser, G. Ruske","doi":"10.1109/ASRU.2001.1034588","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034588","url":null,"abstract":"We propose a speaker clustering scheme working in 'eigenspace'. Speaker models are transformed to a low-dimensional subspace using 'eigenvoices'. For the speaker clustering procedure, simple distance measures, e.g. Euclidean distance, can be applied. Moreover, clustering can be accomplished with base models (for eigenvoice projection) like Gaussian mixture models as well as conventional HMMs. In case of HMMs, re-projection to the original space readily yields acoustic models. Clustering in subspace produces a well-balanced cluster and is easy to control. In the field of speaker adaptation, several principal techniques can be distinguished. The most prominent among them are Bayesian adaptation (e.g. MAP), transformation based approaches (MLLR - maximum likelihood linear regression), as well as so-called eigenspace techniques. Especially the latter have become increasingly popular, as they make use of a-priori information about the distribution of speaker models. The basic approach is commonly called the eigenvoice (EV) approach. Besides these techniques, speaker clustering is a further attractive adaptation scheme, especially since it can be - and has been - easily combined with the above methods.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131693597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Evaluating dialogue strategies and user behavior 评估对话策略和用户行为
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034630
M. Danieli
{"title":"Evaluating dialogue strategies and user behavior","authors":"M. Danieli","doi":"10.1109/ASRU.2001.1034630","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034630","url":null,"abstract":"Summary form only given. The need for accurate and flexible evaluation frameworks for spoken and multimodal dialogue systems has become crucial. In the early design phases of spoken dialogue systems, it is worthwhile evaluating the user's easiness in interacting with different dialogue strategies, rather than the efficiency of the dialogue system in providing the required information. The success of a task-oriented dialogue system greatly depends on the ability of providing a meaningful match between user's expectations and system capabilities, and a good trade-off improves the user's effectiveness. The evaluation methodology requires three steps. The first step has the goal of individuating the different tokens and relations that constitute the user mental model of the task. Once tokens and relations are considered for designing one or more dialogue strategies, the evaluation enters its second step which is constituted by a between-group experiment. Each strategy is tried by a representative set of experimental subjects. The third step includes measuring user effectiveness in providing the spoken dialogue system with the information it needs to solve the task. The paper argues that the application of the three-steps evaluation method may increase our understanding of the user mental model of a task during early stages of development of a spoken language agent. Experimental data supporting this claim are reported.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"474 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131835448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incremental language models for speech recognition using finite-state transducers 使用有限状态换能器的语音识别增量语言模型
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034620
Hans J. G. A. Dolfing, I. L. Hetherington
{"title":"Incremental language models for speech recognition using finite-state transducers","authors":"Hans J. G. A. Dolfing, I. L. Hetherington","doi":"10.1109/ASRU.2001.1034620","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034620","url":null,"abstract":"In the context of the weighted finite-state transducer approach to speech recognition, we investigate a novel decoding strategy to deal with very large n-gram language models often used in large-vocabulary systems. In particular, we present an alternative to full, static expansion and optimization of the finite-state transducer network. This alternative is useful when the individual knowledge sources, modeled as transducers, are too large to be composed and optimized. While the recognition decoder perceives a single, weighted finite-state transducer, we apply a divide-and-conquer technique to split the language model into two parts which add up exactly to the original language model. We investigate the merits of these 'incremental language models' and present some initial results.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":" 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132123945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Speech interfaces for mobile communications 用于移动通信的语音接口
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034596
H. Nakano
{"title":"Speech interfaces for mobile communications","authors":"H. Nakano","doi":"10.1109/ASRU.2001.1034596","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034596","url":null,"abstract":"This paper explains speech interfaces for mobile communication. Mobile interfaces have three important design rules: do not disturb the user's main task, work within the restrictions of user's ability, and minimize the resource requirements. Social acceptance is also important. In Japan, trial and regular services with speech interfaces in mobile environments have already been launched, but they are not widely used. They must be improved in mobile interfaces. The speech interface will not replace Web browsers, but should support and interwork with other interfaces. We also have to discover contents that suit speech interfaces.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131276395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信