{"title":"CENSREC2: corpus and evaluation environments for in car continuous digit speech recognition","authors":"Satoshi Nakamura, M. Fujimoto, K. Takeda","doi":"10.21437/Interspeech.2006-99","DOIUrl":null,"url":null,"abstract":"Abstract This paper introduces a common database and an evaluationframework for connected digit speech recognition in real drivingcar environments, CENSREC-2, as an outcome of IPSJ-SIG SLPNoisy Speech Recognition Evaluation Working Group. Speechdata of CENSREC-2 was collected using two microphones, aclose-talking microphone and a hands-free microphone, underthree car speeds and four car conditions. CENSREC-2 providesfour evaluation environments which are designed using speechdata collected in these car conditions. Index Terms : noisy speech recognition, common evaluationframework, in-car speech corpus. 1. Introduction Recently, the progress of speech recognition technology has beenbrought about by the advent of statistical approaches and large-scale corpora. Furthermore, it is also widely known that progresshas been accelerated by the U.S. DARPA projects [1] initiated inthe late ’80s in terms of project participants competitively develop-ing speech recognition systems for the same task, using the sametraining and test corpus.However, current speech recognition performance must stillbe improved if the system is to be exposed to noisy environments,where speech recognition applications might be used in practice.Therefore, noise robustness is an emerging and crucial factor to besolved for speech recognition techniques.With regard to the noise robustness problem, there have beentwo major evaluation projects, SPINE1, 2 [2] and AURORA [3]-[9]. The SPINE (SPeech recognition In Noisy Environments)project was organized by the U.S.’s DARPA, with SPINE1 in 2000and SPINE2 in 2001. On the other hand, the AURORA was or-ganized by the European Telecommunications Standards Institute(ETSI) [10] AURORA group. To date, AURORA2 [3](a con-nected digit corpus with additive noise), AURORA3 [4]-[7] (an in-car noisy digit corpus), and AURORA4 [8, 9] (a large-vocabularycontinuous-speech recognition corpus with additive noise) havebeen distributed with HTK (HMM Took Kit) [11] scripts, whichcan be used to obtain baseline performance[12].The authors voluntarily organized a special working group inOctober 2001 under the auspices of the Information ProcessingSociety of Japan in order to assess speech recognition technol-ogy in noisy environments. The focus of the working group in-cluded the planning of comprehensive fundamental assessments ofnoisy speech recognition, standardized corpus collection, evalua-tion strategy developments, and distribution of standardized pro-cessing modules. As an outcome of working group, we havealready produced the Japanese AURORA-2, AURORA-2J [13],which comprises the English digits translated into Japanese. Wehave also produced CENSREC-3 (Corpora and Environments forNoisy Speech RECognition) [14], our original evaluation frame-work CENSREC-3 is designed as the evaluation framework ofisolated word recognition in real driving car environments. Themain target application of CENSREC-3 is human voice (hands-free) control of car navigation systems. Thus, CENSREC-3 is de-signed as the evaluation framework that assumes speech-orientedman-machine communication in several car environments.In this paper, we introduce here, CENSREC-2, a commondatabase and an evaluation framework for connected digit speechrecognition in real driving car environments. Speech data ofCENSREC-2 was collected using two microphones, a close-talking microphone and a hands-free microphone, under carefullycontrolled 11 different driving conditions, i.e., combinations ofthree car speeds and four car conditions. CENSREC-2 providesfour evaluation environments which are designed using speechdata collected in these car conditions.","PeriodicalId":290291,"journal":{"name":"IEICE technical report. Natural language understanding and models of communication","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEICE technical report. Natural language understanding and models of communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/Interspeech.2006-99","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23
Abstract
Abstract This paper introduces a common database and an evaluationframework for connected digit speech recognition in real drivingcar environments, CENSREC-2, as an outcome of IPSJ-SIG SLPNoisy Speech Recognition Evaluation Working Group. Speechdata of CENSREC-2 was collected using two microphones, aclose-talking microphone and a hands-free microphone, underthree car speeds and four car conditions. CENSREC-2 providesfour evaluation environments which are designed using speechdata collected in these car conditions. Index Terms : noisy speech recognition, common evaluationframework, in-car speech corpus. 1. Introduction Recently, the progress of speech recognition technology has beenbrought about by the advent of statistical approaches and large-scale corpora. Furthermore, it is also widely known that progresshas been accelerated by the U.S. DARPA projects [1] initiated inthe late ’80s in terms of project participants competitively develop-ing speech recognition systems for the same task, using the sametraining and test corpus.However, current speech recognition performance must stillbe improved if the system is to be exposed to noisy environments,where speech recognition applications might be used in practice.Therefore, noise robustness is an emerging and crucial factor to besolved for speech recognition techniques.With regard to the noise robustness problem, there have beentwo major evaluation projects, SPINE1, 2 [2] and AURORA [3]-[9]. The SPINE (SPeech recognition In Noisy Environments)project was organized by the U.S.’s DARPA, with SPINE1 in 2000and SPINE2 in 2001. On the other hand, the AURORA was or-ganized by the European Telecommunications Standards Institute(ETSI) [10] AURORA group. To date, AURORA2 [3](a con-nected digit corpus with additive noise), AURORA3 [4]-[7] (an in-car noisy digit corpus), and AURORA4 [8, 9] (a large-vocabularycontinuous-speech recognition corpus with additive noise) havebeen distributed with HTK (HMM Took Kit) [11] scripts, whichcan be used to obtain baseline performance[12].The authors voluntarily organized a special working group inOctober 2001 under the auspices of the Information ProcessingSociety of Japan in order to assess speech recognition technol-ogy in noisy environments. The focus of the working group in-cluded the planning of comprehensive fundamental assessments ofnoisy speech recognition, standardized corpus collection, evalua-tion strategy developments, and distribution of standardized pro-cessing modules. As an outcome of working group, we havealready produced the Japanese AURORA-2, AURORA-2J [13],which comprises the English digits translated into Japanese. Wehave also produced CENSREC-3 (Corpora and Environments forNoisy Speech RECognition) [14], our original evaluation frame-work CENSREC-3 is designed as the evaluation framework ofisolated word recognition in real driving car environments. Themain target application of CENSREC-3 is human voice (hands-free) control of car navigation systems. Thus, CENSREC-3 is de-signed as the evaluation framework that assumes speech-orientedman-machine communication in several car environments.In this paper, we introduce here, CENSREC-2, a commondatabase and an evaluation framework for connected digit speechrecognition in real driving car environments. Speech data ofCENSREC-2 was collected using two microphones, a close-talking microphone and a hands-free microphone, under carefullycontrolled 11 different driving conditions, i.e., combinations ofthree car speeds and four car conditions. CENSREC-2 providesfour evaluation environments which are designed using speechdata collected in these car conditions.