CENSREC2: corpus and evaluation environments for in car continuous digit speech recognition

IEICE technical report. Natural language understanding and models of communication Pub Date : 2005-12-15 DOI:10.21437/Interspeech.2006-99

Satoshi Nakamura, M. Fujimoto, K. Takeda

{"title":"CENSREC2: corpus and evaluation environments for in car continuous digit speech recognition","authors":"Satoshi Nakamura, M. Fujimoto, K. Takeda","doi":"10.21437/Interspeech.2006-99","DOIUrl":null,"url":null,"abstract":"Abstract This paper introduces a common database and an evaluationframework for connected digit speech recognition in real drivingcar environments, CENSREC-2, as an outcome of IPSJ-SIG SLPNoisy Speech Recognition Evaluation Working Group. Speechdata of CENSREC-2 was collected using two microphones, aclose-talking microphone and a hands-free microphone, underthree car speeds and four car conditions. CENSREC-2 providesfour evaluation environments which are designed using speechdata collected in these car conditions. Index Terms : noisy speech recognition, common evaluationframework, in-car speech corpus. 1. Introduction Recently, the progress of speech recognition technology has beenbrought about by the advent of statistical approaches and large-scale corpora. Furthermore, it is also widely known that progresshas been accelerated by the U.S. DARPA projects [1] initiated inthe late ’80s in terms of project participants competitively develop-ing speech recognition systems for the same task, using the sametraining and test corpus.However, current speech recognition performance must stillbe improved if the system is to be exposed to noisy environments,where speech recognition applications might be used in practice.Therefore, noise robustness is an emerging and crucial factor to besolved for speech recognition techniques.With regard to the noise robustness problem, there have beentwo major evaluation projects, SPINE1, 2 [2] and AURORA [3]-[9]. The SPINE (SPeech recognition In Noisy Environments)project was organized by the U.S.’s DARPA, with SPINE1 in 2000and SPINE2 in 2001. On the other hand, the AURORA was or-ganized by the European Telecommunications Standards Institute(ETSI) [10] AURORA group. To date, AURORA2 [3](a con-nected digit corpus with additive noise), AURORA3 [4]-[7] (an in-car noisy digit corpus), and AURORA4 [8, 9] (a large-vocabularycontinuous-speech recognition corpus with additive noise) havebeen distributed with HTK (HMM Took Kit) [11] scripts, whichcan be used to obtain baseline performance[12].The authors voluntarily organized a special working group inOctober 2001 under the auspices of the Information ProcessingSociety of Japan in order to assess speech recognition technol-ogy in noisy environments. The focus of the working group in-cluded the planning of comprehensive fundamental assessments ofnoisy speech recognition, standardized corpus collection, evalua-tion strategy developments, and distribution of standardized pro-cessing modules. As an outcome of working group, we havealready produced the Japanese AURORA-2, AURORA-2J [13],which comprises the English digits translated into Japanese. Wehave also produced CENSREC-3 (Corpora and Environments forNoisy Speech RECognition) [14], our original evaluation frame-work CENSREC-3 is designed as the evaluation framework ofisolated word recognition in real driving car environments. Themain target application of CENSREC-3 is human voice (hands-free) control of car navigation systems. Thus, CENSREC-3 is de-signed as the evaluation framework that assumes speech-orientedman-machine communication in several car environments.In this paper, we introduce here, CENSREC-2, a commondatabase and an evaluation framework for connected digit speechrecognition in real driving car environments. Speech data ofCENSREC-2 was collected using two microphones, a close-talking microphone and a hands-free microphone, under carefullycontrolled 11 different driving conditions, i.e., combinations ofthree car speeds and four car conditions. CENSREC-2 providesfour evaluation environments which are designed using speechdata collected in these car conditions.","PeriodicalId":290291,"journal":{"name":"IEICE technical report. Natural language understanding and models of communication","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEICE technical report. Natural language understanding and models of communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/Interspeech.2006-99","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

Abstract

Abstract This paper introduces a common database and an evaluationframework for connected digit speech recognition in real drivingcar environments, CENSREC-2, as an outcome of IPSJ-SIG SLPNoisy Speech Recognition Evaluation Working Group. Speechdata of CENSREC-2 was collected using two microphones, aclose-talking microphone and a hands-free microphone, underthree car speeds and four car conditions. CENSREC-2 providesfour evaluation environments which are designed using speechdata collected in these car conditions. Index Terms : noisy speech recognition, common evaluationframework, in-car speech corpus. 1. Introduction Recently, the progress of speech recognition technology has beenbrought about by the advent of statistical approaches and large-scale corpora. Furthermore, it is also widely known that progresshas been accelerated by the U.S. DARPA projects [1] initiated inthe late ’80s in terms of project participants competitively develop-ing speech recognition systems for the same task, using the sametraining and test corpus.However, current speech recognition performance must stillbe improved if the system is to be exposed to noisy environments,where speech recognition applications might be used in practice.Therefore, noise robustness is an emerging and crucial factor to besolved for speech recognition techniques.With regard to the noise robustness problem, there have beentwo major evaluation projects, SPINE1, 2 [2] and AURORA [3]-[9]. The SPINE (SPeech recognition In Noisy Environments)project was organized by the U.S.’s DARPA, with SPINE1 in 2000and SPINE2 in 2001. On the other hand, the AURORA was or-ganized by the European Telecommunications Standards Institute(ETSI) [10] AURORA group. To date, AURORA2 [3](a con-nected digit corpus with additive noise), AURORA3 [4]-[7] (an in-car noisy digit corpus), and AURORA4 [8, 9] (a large-vocabularycontinuous-speech recognition corpus with additive noise) havebeen distributed with HTK (HMM Took Kit) [11] scripts, whichcan be used to obtain baseline performance[12].The authors voluntarily organized a special working group inOctober 2001 under the auspices of the Information ProcessingSociety of Japan in order to assess speech recognition technol-ogy in noisy environments. The focus of the working group in-cluded the planning of comprehensive fundamental assessments ofnoisy speech recognition, standardized corpus collection, evalua-tion strategy developments, and distribution of standardized pro-cessing modules. As an outcome of working group, we havealready produced the Japanese AURORA-2, AURORA-2J [13],which comprises the English digits translated into Japanese. Wehave also produced CENSREC-3 (Corpora and Environments forNoisy Speech RECognition) [14], our original evaluation frame-work CENSREC-3 is designed as the evaluation framework ofisolated word recognition in real driving car environments. Themain target application of CENSREC-3 is human voice (hands-free) control of car navigation systems. Thus, CENSREC-3 is de-signed as the evaluation framework that assumes speech-orientedman-machine communication in several car environments.In this paper, we introduce here, CENSREC-2, a commondatabase and an evaluation framework for connected digit speechrecognition in real driving car environments. Speech data ofCENSREC-2 was collected using two microphones, a close-talking microphone and a hands-free microphone, under carefullycontrolled 11 different driving conditions, i.e., combinations ofthree car speeds and four car conditions. CENSREC-2 providesfour evaluation environments which are designed using speechdata collected in these car conditions.

查看原文本刊更多论文

CENSREC2:汽车连续数字语音识别的语料库和评估环境

摘要:本文介绍了IPSJ-SIG slp噪声语音识别评估工作组的成果，一个用于真实驾驶环境中互联数字语音识别的通用数据库和评估框架censrec2。在三种车速和四种车况下，使用两个麦克风、近距离说话麦克风和免提麦克风采集censrec2的语音数据。censrec2提供了四种评估环境，这些环境使用在这些汽车条件下收集的语音数据设计。检索术语:噪声语音识别，通用评估框架，车内语音语料库。1. 近年来，统计方法和大规模语料库的出现带来了语音识别技术的进步。此外，众所周知，在80年代末启动的美国DARPA项目[1]中，项目参与者使用相同的训练和测试语料库，为相同的任务竞争性地开发语音识别系统，从而加速了进展。然而，如果系统暴露在嘈杂的环境中，语音识别应用可能在实践中使用，那么当前的语音识别性能仍然必须得到改善。因此，噪声鲁棒性是语音识别技术亟待解决的一个重要问题。关于噪声鲁棒性问题，目前有两个主要的评价项目，SPINE1, 2[2]和AURORA[3]-[9]。SPINE(嘈杂环境中的语音识别)项目是由美国国防部高级研究计划局组织的，SPINE1于2000年启动，SPINE2于2001年启动。另一方面，AURORA是由欧洲电信标准协会(ETSI) [10] AURORA组组织的。迄今为止，AURORA2[3](一个带有加性噪声的连接数字语料库)、AURORA3[4]-[7](一个带有加性噪声的车内噪声数字语料库)和AURORA4[8,9](一个带有加性噪声的大词汇量连续语音识别语料库)已经与HTK (HMM Took Kit)[11]脚本一起分发，这些脚本可用于获得基线性能[12]。2001年10月，在日本信息处理学会的支持下，作者自愿组织了一个特别工作组，以评估嘈杂环境下的语音识别技术。工作组的工作重点包括规划噪声语音识别的综合基础评估、标准化语料库收集、评估策略开发和标准化处理模块的分发。作为工作组的成果，我们已经制作了日文AURORA-2, AURORA-2J[13]，其中包括翻译成日文的英文数字。我们还制作了censrec3 (cora and Environments forNoisy Speech RECognition)[14]，我们的原始评估框架censrec3被设计为真实驾驶汽车环境中孤立词识别的评估框架。censrec3的主要目标应用是汽车导航系统的人声(免提)控制。因此，censrec3被设计为评估框架，该框架假定在几个汽车环境中存在面向语音的人机通信。在本文中，我们介绍了censrec2，一个通用数据库和一个评估框架，用于在真实驾驶汽车环境中连接数字语音识别。censrec2的语音数据是在精心控制的11种不同驾驶条件下，即三种车速和四种车况的组合下，使用两个麦克风，一个近距离交谈麦克风和一个免提麦克风收集的。censrec2提供了四种评估环境，这些环境使用在这些汽车条件下收集的语音数据设计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEICE technical report. Natural language understanding and models of communication

自引率

0.00%

发文量