1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings最新文献

Progress towards speech models that model speech 语音模型的进展，模拟语音

1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.658995

Martin Russell

引用次数: 7

Learning dialogue strategies within the Markov decision process framework 在马尔可夫决策过程框架内学习对话策略

1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.658989

E. Levin, R. Pieraccini, W. Eckert

引用次数: 155

A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) 降低单词错误率的后处理系统:识别器输出投票错误减少(ROVER)

1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.659110

J. Fiscus

{"title":"A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)","authors":"J. Fiscus","doi":"10.1109/ASRU.1997.659110","DOIUrl":"https://doi.org/10.1109/ASRU.1997.659110","url":null,"abstract":"Describes a system developed at NIST to produce a composite automatic speech recognition (ASR) system output when the outputs of multiple ASR systems are available, and for which, in many cases, the composite ASR output has a lower error rate than any of the individual systems. The system implements a \"voting\" or rescoring process to reconcile differences in ASR system outputs. We refer to this system as the NIST Recognizer Output Voting Error Reduction (ROVER) system. As additional knowledge sources are added to an ASR system (e.g. acoustic and language models), error rates are typically decreased. This paper describes a post-recognition process which models the output generated by multiple ASR systems as independent knowledge sources that can be combined and used to generate an output with reduced error rate. To accomplish this, the outputs of multiple of ASR systems are combined into a single, minimal-cost word transition network (WTN) via iterative applications of dynamic programming (DP) alignments. The resulting network is searched by an automatic rescoring or \"voting\" process that selects the output sequence with the lowest score.","PeriodicalId":253278,"journal":{"name":"1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130407240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1221

Stream derivation and clustering scheme for subspace distribution clustering hidden Markov model 子空间分布聚类隐马尔可夫模型的流派生与聚类方案

1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.659109

Brian Mak, Enrico Bocchieri, Etienne Barnard

{"title":"Stream derivation and clustering scheme for subspace distribution clustering hidden Markov model","authors":"Brian Mak, Enrico Bocchieri, Etienne Barnard","doi":"10.1109/ASRU.1997.659109","DOIUrl":"https://doi.org/10.1109/ASRU.1997.659109","url":null,"abstract":"Bocchieri and Mak (Proc. Eurospeech, vol. 1, p. 107-10, 1997) introduced a novel subspace distribution clustering hidden Markov model (SDCHMM) as an approximation to a continuous-density HMM (CDHMM). Deriving SDCHMMs from CDHMMs requires a definition of multiple streams and a Gaussian clustering scheme. Previously, we have tried 4 and 13 streams, which are common but ad hoc choices. In this paper, we present a simple and coherent definition for streams of any dimension: the streams comprise the most correlated features. The new definition is shown to give better performance in two speech recognition tasks. The clustering scheme of Bocchieri and Mak is an O(n/sup 2/) algorithm which can be slow when the number of Gaussians in the original CDHMMs is large. Now, we have devised a modified k-means clustering scheme using the Bhattacharyya distance as the distance measure between Gaussian clusters. Not only is the new clustering scheme faster but, when combined with the new stream definitions, we now obtain SDCHMMs which perform at least as well as the original CDHMMs (with better results in some cases).","PeriodicalId":253278,"journal":{"name":"1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings","volume":"29 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123162329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Pronunciation modelling for conversational speech recognition: a status report from WS97 会话语音识别的发音建模:来自WS97的状态报告

1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.658973

B. Byrne, M. Finke, S. Khudanpur, J. McDonough, H. Nock, M. Riley, M. Saraçlar, Chuck Wooters, G. Zavaliagkos

引用次数: 27

Phonetically adaptive cepstrum mean normalization for acoustic mismatch compensation 声学失配补偿的语音自适应倒频谱均值归一化

1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.659121

M. Morishima, T. Isobe, J. Takahashi

引用次数: 4

Synergistic modalities for human/machine communication 人机通信的协同模式

1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.658967

J. Flanagan

{"title":"Synergistic modalities for human/machine communication","authors":"J. Flanagan","doi":"10.1109/ASRU.1997.658967","DOIUrl":"https://doi.org/10.1109/ASRU.1997.658967","url":null,"abstract":"Natural communication with machines is a crucial factor in bringing the benefits of networked computers to mass markets. In particular, the sensory dimensions of sight, sound and touch are comfortable and convenient modalities for the human user. New technologies are now emerging in these domains that can support human/machine communication with features that emulate face-to-face interaction. A current challenge is how to integrate the, as yet, imperfect technologies to achieve synergies that transcend the benefit of a single modality. Because speech is a preferred means for human information exchange, conversational interaction with machines will play a central role in collaborative knowledge work mediated by networked computers. Utilizing speech in combination with simultaneous visual gestures and haptic signalling requires software agents that are able to fuse the error-susceptible sensory information into reliable interpretations that are responsive to (and anticipatory of) human user intentions. This report draws a perspective on research in human/machine communication technologies aimed at supporting computer conferencing and collaborative problem solving.","PeriodicalId":253278,"journal":{"name":"1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125302442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A statistical language modeling approach integrating local and global constraints 一种集成局部和全局约束的统计语言建模方法

1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.659014

J. Bellegarda

引用次数: 10

Variable threshold vector quantization for reduced continuous density likelihood computation in speech recognition 语音识别中减少连续密度似然计算的可变阈值矢量量化

1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.659108

S. Herman, R.A. Sukkar

引用次数: 8

A tonotopic artificial neural network architecture for phoneme probability estimation 一种用于音素概率估计的同位人工神经网络结构

1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings Pub Date : 1997-12-14 DOI: 10.1109/ASRU.1997.659000

N. Strom

引用次数: 11