1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings最新文献

筛选
英文 中文
Progress towards speech models that model speech 语音模型的进展,模拟语音
1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.658995
Martin Russell
{"title":"Progress towards speech models that model speech","authors":"Martin Russell","doi":"10.1109/ASRU.1997.658995","DOIUrl":"https://doi.org/10.1109/ASRU.1997.658995","url":null,"abstract":"This paper presents a personal view of recent advances in automatic speech recognition. The analysis is concerned with progress in speech pattern modelling, rather than recogniser performance. Despite the limitations of current approaches, it is argued that extension and development of these techniques provides a viable way forward. It is further suggested that the significance of a number of recent developments, such as sub-band speech recognition and segment modelling, is primarily in their potential for overcoming fundamental limitations of current HMM-based approaches, and not in the short-term improvement in recognition accuracy which has been achieved.","PeriodicalId":253278,"journal":{"name":"1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121252731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Learning dialogue strategies within the Markov decision process framework 在马尔可夫决策过程框架内学习对话策略
1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.658989
E. Levin, R. Pieraccini, W. Eckert
{"title":"Learning dialogue strategies within the Markov decision process framework","authors":"E. Levin, R. Pieraccini, W. Eckert","doi":"10.1109/ASRU.1997.658989","DOIUrl":"https://doi.org/10.1109/ASRU.1997.658989","url":null,"abstract":"We introduce a stochastic model for dialogue systems based on the Markov decision process. Within this framework we show that the problem of dialogue strategy design can be stated as an optimization problem, and solved by a variety of methods, including the reinforcement learning approach. The advantages of this new paradigm include objective evaluation of dialogue systems and their automatic design and adaptation. We show some preliminary results on learning a dialogue strategy for an air travel information system.","PeriodicalId":253278,"journal":{"name":"1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125041186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 155
A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) 降低单词错误率的后处理系统:识别器输出投票错误减少(ROVER)
1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.659110
J. Fiscus
{"title":"A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)","authors":"J. Fiscus","doi":"10.1109/ASRU.1997.659110","DOIUrl":"https://doi.org/10.1109/ASRU.1997.659110","url":null,"abstract":"Describes a system developed at NIST to produce a composite automatic speech recognition (ASR) system output when the outputs of multiple ASR systems are available, and for which, in many cases, the composite ASR output has a lower error rate than any of the individual systems. The system implements a \"voting\" or rescoring process to reconcile differences in ASR system outputs. We refer to this system as the NIST Recognizer Output Voting Error Reduction (ROVER) system. As additional knowledge sources are added to an ASR system (e.g. acoustic and language models), error rates are typically decreased. This paper describes a post-recognition process which models the output generated by multiple ASR systems as independent knowledge sources that can be combined and used to generate an output with reduced error rate. To accomplish this, the outputs of multiple of ASR systems are combined into a single, minimal-cost word transition network (WTN) via iterative applications of dynamic programming (DP) alignments. The resulting network is searched by an automatic rescoring or \"voting\" process that selects the output sequence with the lowest score.","PeriodicalId":253278,"journal":{"name":"1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130407240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1221
Stream derivation and clustering scheme for subspace distribution clustering hidden Markov model 子空间分布聚类隐马尔可夫模型的流派生与聚类方案
1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.659109
Brian Mak, Enrico Bocchieri, Etienne Barnard
{"title":"Stream derivation and clustering scheme for subspace distribution clustering hidden Markov model","authors":"Brian Mak, Enrico Bocchieri, Etienne Barnard","doi":"10.1109/ASRU.1997.659109","DOIUrl":"https://doi.org/10.1109/ASRU.1997.659109","url":null,"abstract":"Bocchieri and Mak (Proc. Eurospeech, vol. 1, p. 107-10, 1997) introduced a novel subspace distribution clustering hidden Markov model (SDCHMM) as an approximation to a continuous-density HMM (CDHMM). Deriving SDCHMMs from CDHMMs requires a definition of multiple streams and a Gaussian clustering scheme. Previously, we have tried 4 and 13 streams, which are common but ad hoc choices. In this paper, we present a simple and coherent definition for streams of any dimension: the streams comprise the most correlated features. The new definition is shown to give better performance in two speech recognition tasks. The clustering scheme of Bocchieri and Mak is an O(n/sup 2/) algorithm which can be slow when the number of Gaussians in the original CDHMMs is large. Now, we have devised a modified k-means clustering scheme using the Bhattacharyya distance as the distance measure between Gaussian clusters. Not only is the new clustering scheme faster but, when combined with the new stream definitions, we now obtain SDCHMMs which perform at least as well as the original CDHMMs (with better results in some cases).","PeriodicalId":253278,"journal":{"name":"1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings","volume":"29 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123162329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Pronunciation modelling for conversational speech recognition: a status report from WS97 会话语音识别的发音建模:来自WS97的状态报告
1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.658973
B. Byrne, M. Finke, S. Khudanpur, J. McDonough, H. Nock, M. Riley, M. Saraçlar, Chuck Wooters, G. Zavaliagkos
{"title":"Pronunciation modelling for conversational speech recognition: a status report from WS97","authors":"B. Byrne, M. Finke, S. Khudanpur, J. McDonough, H. Nock, M. Riley, M. Saraçlar, Chuck Wooters, G. Zavaliagkos","doi":"10.1109/ASRU.1997.658973","DOIUrl":"https://doi.org/10.1109/ASRU.1997.658973","url":null,"abstract":"Accurately modelling of pronunciation variability in conversational speech is an important component for automatic speech recognition. We describe some of the projects undertaken in this direction at WS97 [the Fifth LVCSR (large-vocabulary conversational speech recognition) Summer Workshop], held at Johns Hopkins University, Baltimore, in July-August 1997. We first illustrate a use of hand-labelled phonetic transcriptions of a portion of the Switchboard corpus, in conjunction with statistical techniques, to learn alternatives to canonical pronunciations of words. We then describe the use of these alternative pronunciations in a recognition experiment as well as in the acoustic training of an automatic speech recognition system. Our results show a reduction of the word error rate in both cases-0.9% without acoustic retraining and 2.2% with acoustic retraining.","PeriodicalId":253278,"journal":{"name":"1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122450741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Phonetically adaptive cepstrum mean normalization for acoustic mismatch compensation 声学失配补偿的语音自适应倒频谱均值归一化
1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.659121
M. Morishima, T. Isobe, J. Takahashi
{"title":"Phonetically adaptive cepstrum mean normalization for acoustic mismatch compensation","authors":"M. Morishima, T. Isobe, J. Takahashi","doi":"10.1109/ASRU.1997.659121","DOIUrl":"https://doi.org/10.1109/ASRU.1997.659121","url":null,"abstract":"We propose a new technique that compensates for an acoustic mismatch. This technique is simple and can estimate the acoustic mismatch more accurately than conventional cepstrum mean normalization (CMN), because it takes into consideration the kind of phonemes and their frequency, and can calculate the acoustic mismatch in detail. In this procedure the acoustic mismatch can be estimated as the difference between the centroid vector of distorted speech and that of acoustic models. The cepstral mean of distorted speech is the centroid vector including the distortion. The centroid vector calculated from parameters of acoustic models is regarded as the centroid vector when the distorted speech is assumed to be clean speech. The acoustic models used for calculation are for phonemes that appear in the transcription of the speech. This technique achieves a high word error reduction rate of 73% for ordinary analog telephone speech and 70% for wireless telephone handset speech.","PeriodicalId":253278,"journal":{"name":"1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126279023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Synergistic modalities for human/machine communication 人机通信的协同模式
1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.658967
J. Flanagan
{"title":"Synergistic modalities for human/machine communication","authors":"J. Flanagan","doi":"10.1109/ASRU.1997.658967","DOIUrl":"https://doi.org/10.1109/ASRU.1997.658967","url":null,"abstract":"Natural communication with machines is a crucial factor in bringing the benefits of networked computers to mass markets. In particular, the sensory dimensions of sight, sound and touch are comfortable and convenient modalities for the human user. New technologies are now emerging in these domains that can support human/machine communication with features that emulate face-to-face interaction. A current challenge is how to integrate the, as yet, imperfect technologies to achieve synergies that transcend the benefit of a single modality. Because speech is a preferred means for human information exchange, conversational interaction with machines will play a central role in collaborative knowledge work mediated by networked computers. Utilizing speech in combination with simultaneous visual gestures and haptic signalling requires software agents that are able to fuse the error-susceptible sensory information into reliable interpretations that are responsive to (and anticipatory of) human user intentions. This report draws a perspective on research in human/machine communication technologies aimed at supporting computer conferencing and collaborative problem solving.","PeriodicalId":253278,"journal":{"name":"1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125302442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A statistical language modeling approach integrating local and global constraints 一种集成局部和全局约束的统计语言建模方法
1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.659014
J. Bellegarda
{"title":"A statistical language modeling approach integrating local and global constraints","authors":"J. Bellegarda","doi":"10.1109/ASRU.1997.659014","DOIUrl":"https://doi.org/10.1109/ASRU.1997.659014","url":null,"abstract":"A new framework is proposed to integrate the various constraints, both local and global, that are present in language. Local constraints are captured via n-gram language modeling, while global constraints are taken into account through the use of latent semantic analysis. An integrative formulation is derived for the combination of these two paradigms, resulting in several families of multi-span language models for large-vocabulary speech recognition. Because of the inherent complementarity in the two types of constraints, the performance of the integrated language models, as measured by perplexity, compares favorably with the corresponding n-gram performance.","PeriodicalId":253278,"journal":{"name":"1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings","volume":"22 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131687241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Variable threshold vector quantization for reduced continuous density likelihood computation in speech recognition 语音识别中减少连续密度似然计算的可变阈值矢量量化
1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.659108
S. Herman, R.A. Sukkar
{"title":"Variable threshold vector quantization for reduced continuous density likelihood computation in speech recognition","authors":"S. Herman, R.A. Sukkar","doi":"10.1109/ASRU.1997.659108","DOIUrl":"https://doi.org/10.1109/ASRU.1997.659108","url":null,"abstract":"Vector quantization (VQ) has been explored in the past as a means of achieving reductions in likelihood computation for hidden Markov models (HMMs) which use Gaussian mixtures for their output densities. In this paper, we present a new method for choosing which mixtures can be discarded for each pair of HMM state and vector quantization index. Traditionally, a global threshold was used to specify the maximum distance a mixture mean could lie from a VQ codeword before being considered negligible in likelihood calculations for observation vectors contained in that VQ cell. Our technique uses a threshold which varies with VQ cell volume. Thus, larger cells are allocated more mixtures than smaller cells, in order to provide a more uniform coverage of the acoustic space and thereby improve computational efficiency.","PeriodicalId":253278,"journal":{"name":"1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132574118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A tonotopic artificial neural network architecture for phoneme probability estimation 一种用于音素概率估计的同位人工神经网络结构
1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.659000
N. Strom
{"title":"A tonotopic artificial neural network architecture for phoneme probability estimation","authors":"N. Strom","doi":"10.1109/ASRU.1997.659000","DOIUrl":"https://doi.org/10.1109/ASRU.1997.659000","url":null,"abstract":"A novel sparse ANN connection scheme is proposed. It is inspired by the so called tonotopic organization of the auditory nerve, and allows a more detailed representation of the speech spectrum to be input to an ANN than is commonly used. A consequence of the new connection scheme is that more resources are allocated to analysis within narrow frequency sub bands-a concept that has recently been investigated by others with so called sub band ASR. ANNs with the proposed architecture have been evaluated on the TIMIT database for phoneme recognition, and are found to give better phoneme recognition performance than ANNs based on standard mel frequency cepstrum input. The lowest achieved phone error rate, 26.7%, is very close to the lowest published result for the core test set of the TIMIT database.","PeriodicalId":253278,"journal":{"name":"1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131786094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信