Effects of acoustic mismatches on speech recognition accuracies due to playback-recorded speech corpus

2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology Pub Date : 2012-05-16 DOI:10.1109/ECTICON.2012.6254211

A. Suchato, S. Chanjaradwichai, N. Kertkeidkachorn, S. Vorapatratorn, P. Hirankan, T. Suri, K. Likitsupin, S. Chuetanapinyo, P. Punyabukkana

{"title":"Effects of acoustic mismatches on speech recognition accuracies due to playback-recorded speech corpus","authors":"A. Suchato, S. Chanjaradwichai, N. Kertkeidkachorn, S. Vorapatratorn, P. Hirankan, T. Suri, K. Likitsupin, S. Chuetanapinyo, P. Punyabukkana","doi":"10.1109/ECTICON.2012.6254211","DOIUrl":null,"url":null,"abstract":"Modern speech recognition techniques rely on large amount of speech data whose acoustic characteristics match with the operating environments to train their acoustic models. Gathering training data from loudspeakers playing recorded speech utterances are far more practical than from human speakers. This paper presents results from speech recognition experiments providing practical insights on effects caused by utterances re-recorded form loudspeakers. A clean-speech corpus of sixty human speakers was built using two different microphones and their playbacks were re-recorded. Results show that, with minimal lexical constraints, accuracies degraded for playback-trained system, even with no mismatches between training and test data. However, mismatches did not affect cases with tighter high-level constraints, such as number and limited-vocabulary word recognitions. A procedure to reduce mismatches caused by constructing corpus from playbacks was introduced. The procedure was shown to make the accuracy of a playback-trained system 48% closer to the one of the system trained with speech in matched environment.","PeriodicalId":6319,"journal":{"name":"2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology","volume":"30 1","pages":"1-4"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECTICON.2012.6254211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Modern speech recognition techniques rely on large amount of speech data whose acoustic characteristics match with the operating environments to train their acoustic models. Gathering training data from loudspeakers playing recorded speech utterances are far more practical than from human speakers. This paper presents results from speech recognition experiments providing practical insights on effects caused by utterances re-recorded form loudspeakers. A clean-speech corpus of sixty human speakers was built using two different microphones and their playbacks were re-recorded. Results show that, with minimal lexical constraints, accuracies degraded for playback-trained system, even with no mismatches between training and test data. However, mismatches did not affect cases with tighter high-level constraints, such as number and limited-vocabulary word recognitions. A procedure to reduce mismatches caused by constructing corpus from playbacks was introduced. The procedure was shown to make the accuracy of a playback-trained system 48% closer to the one of the system trained with speech in matched environment.

查看原文本刊更多论文

回放录音语音语料库对语音识别精度的影响

现代语音识别技术依靠大量声学特征与操作环境相匹配的语音数据来训练声学模型。从播放录音语音的扬声器中收集训练数据比从人类说话者那里收集训练数据要实际得多。本文介绍了语音识别实验的结果，为扬声器重新录制的话语所引起的影响提供了实际的见解。使用两种不同的麦克风建立了60个人类说话者的干净语音语料库，并重新录制了他们的回放。结果表明，即使在训练和测试数据之间没有不匹配的情况下，在最小的词法约束下，回放训练系统的准确率也会下降。但是，不匹配不会影响具有更严格的高级约束的情况，例如数字和有限词汇表的单词识别。介绍了一种减少从回放中构造语料库引起的不匹配的方法。结果表明，该程序使经过回放训练的系统的准确率与在匹配环境中使用语音训练的系统的准确率接近48%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology

自引率

0.00%

发文量