2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)最新文献

Error simulation for training statistical dialogue systems 训练统计对话系统的误差模拟

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-13 DOI: 10.1109/ASRU.2007.4430167

J. Schatzmann, Blaise Thomson, S. Young

引用次数: 90

Predictive linear transforms for noise robust speech recognition 基于预测线性变换的噪声鲁棒语音识别

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-13 DOI: 10.1109/ASRU.2007.4430084

M. Gales, R. V. Dalen

引用次数: 31

Development of a phonetic system for large vocabulary Arabic speech recognition 大词汇量阿拉伯语语音识别语音系统的开发

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-13 DOI: 10.1109/ASRU.2007.4430078

M. Gales, Frank Diehl, C. Raut, M. Tomalin, P. Woodland, Kai Yu

引用次数: 29

Speechfind for CDP: Advances in spoken document retrieval for the U. S. collaborative digitization program 面向CDP的语音检索:美国协作数字化计划的语音文档检索进展

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430195

Wooil Kim, J. Hansen

{"title":"Speechfind for CDP: Advances in spoken document retrieval for the U. S. collaborative digitization program","authors":"Wooil Kim, J. Hansen","doi":"10.1109/ASRU.2007.4430195","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430195","url":null,"abstract":"This paper presents our recent advances for SpeechFind, a CRSS-UTD designed spoken document retrieval system for the U.S. based Collaborative Digitization Program (CDP). A proto-type of SpeechFind for the CDP is currently serving as the search engine for 1,300 hours of CDP audio content which contain a wide range of acoustic conditions, vocabulary and period selection, and topics. In an effort to determine the amount of user corrected transcripts needed to impact automatic speech recognition (ASR) and audio search, a web-based online interface for verification of ASR-generated transcripts was developed. The procedure for enhancing the transcription performance for SpeechFind is also presented. A selection of adaptation methods for language and acoustic models are employed depending on the acoustics of the corpora under test. Experimental results on the CDP corpus demonstrate that the employed model adaptation scheme using the verified transcripts is effective in improving recognition accuracy. Through a combination of feature/acoustic model enhancement and language model selection, up to 24.8% relative improvement in ASR was obtained. The SpeechFind system, employing automatic transcript generation, online CDP transcript correction, and our transcript reliability estimator, demonstrates a comprehensive support mechanism to ensure reliable transcription and search for U.S. libraries with limited speech technology experience.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121031290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Development and portability of ASR and Q&A modules for real-environment speech-oriented guidance systems 面向真实环境语音制导系统的ASR和问答模块的开发与可移植性

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430166

T. Cincarek, Hiromichi Kawanami, H. Saruwatari, K. Shikano

{"title":"Development and portability of ASR and Q&A modules for real-environment speech-oriented guidance systems","authors":"T. Cincarek, Hiromichi Kawanami, H. Saruwatari, K. Shikano","doi":"10.1109/ASRU.2007.4430166","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430166","url":null,"abstract":"In this paper, we investigate development and portability of ASR and Q&A modules of speech-oriented guidance systems for two different real environments. An initial prototype system has been constructed for a local community center using two years of human-labeled data collected by the system. Collection of real user data is required because ASR task and Q&A domain of a guidance system are defined by the target environment and potential users. However, since human preparation of data is always costly, most often only a relatively small amount real data will be available for system adaptation in practice. Therefore, the portability of the initial prototype system is investigated for a different environment, a local subway station. The purpose is to identify reusable system parts. The ASR module is found to be highly portable across the two environments. However, the portability of the Q&A module was only medium. From an objective analysis it became clear that this is mainly due to the environment-dependent domain differences between the two systems. This implicates that it will always be important to take the behavior of actual users under real conditions into account to build a system with high user satisfaction.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124965120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Introduction of the METI project “development of fundamental speech recognition technology” 日本经济产业省“基础语音识别技术开发”项目介绍

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430117

S. Furui, Tetsunori Kobayashi

引用次数: 0

A study on rescoring using HMM-based detectors for continuous speech recognition 基于hmm检测器的连续语音识别评分研究

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430175

Qiang Fu, B. Juang

引用次数: 5

Never-ending learning system for on-line speaker diarization 永无休止的在线扬声器拨号学习系统

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430197

K. Markov, Satoshi Nakamura

{"title":"Never-ending learning system for on-line speaker diarization","authors":"K. Markov, Satoshi Nakamura","doi":"10.1109/ASRU.2007.4430197","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430197","url":null,"abstract":"In this paper, we describe new high-performance on-line speaker diarization system which works faster than real-time and has very low latency. It consists of several modules including voice activity detection, novel speaker detection, speaker gender and speaker identity classification. All modules share a set of Gaussian mixture models (GMM) representing pause, male and female speakers, and each individual speaker. Initially, there are only three GMMs for pause and two speaker genders, trained in advance from some data. During the speaker diarization process, for each speech segment it is decided whether it comes from a new speaker or from already known speaker. In case of a new speaker, his/her gender is identified, and then, from the corresponding gender GMM, a new GMM is spawned by copying its parameters. This GMM is learned on-line using the speech segment data and from this point it is used to represent the new speaker. All individual speaker models are produced in this way. In the case of an old speaker, s/he is identified and the corresponding GMM is again learned on-line. In order to prevent an unlimited grow of the speaker model number, those models that have not been selected as winners for a long period of time are deleted from the system. This allows the system to be able to perform its task indefinitely in addition to being capable of self-organization, i.e. unsupervised adaptive learning, and preservation of the learned knowledge, i.e. speakers. Such functionalities are attributed to the so called Never-Ending Learning systems. For evaluation, we used part of the TC-STAR database consisting of European Parliament Plenary speeches. The results show that this system achieves a speaker diarization error rate of 4.6% with latency of at most 3 seconds.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129426523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

Hierarchical Pitman-Yor language models for ASR in meetings 会议中ASR的分层Pitman-Yor语言模型

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430096

Songfang Huang, S. Renals

引用次数: 35

Investigating linguistic knowledge in a maximum entropy token-based language model 研究基于最大熵符号的语言模型中的语言知识

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430104

Jia Cui, Yi Su, Keith B. Hall, F. Jelinek

引用次数: 8