基于dnn的成人语音与非母语儿童语音自动识别方法研究

Yao Qian, Xinhao Wang, Keelan Evanini, David Suendermann-Oeft
{"title":"基于dnn的成人语音与非母语儿童语音自动识别方法研究","authors":"Yao Qian, Xinhao Wang, Keelan Evanini, David Suendermann-Oeft","doi":"10.21437/WOCCI.2016-7","DOIUrl":null,"url":null,"abstract":"Acoustic models for state-of-the-art DNN-based speech recognition systems are typically trained using at least several hundred hours of task-specific training data. However, this amount of training data is not always available for some applications. In this paper, we investigate how to use an adult speech corpus to improve DNN-based automatic speech recognition for non-native children's speech. Although there are many acoustic and linguistic mismatches between the speech of adults and children, adult speech can still be used to boost the performance of a speech recognizer for children using acoustic modeling techniques based on the DNN framework. The experimental results show that the best recognition performance can be achieved by combining children's training data with adult training data of approximately the same size and initializing the DNN with the weights obtained by pre-training using the full training set of the adult corpus. This system can outperform the baseline system trained on only children's speech with an overall relative WER reduction of 11.9%. Among the three speaking tasks studied, the picture narration task shows the largest gain with a WER reduction from 24.6 % to 20.1%.","PeriodicalId":91973,"journal":{"name":"The ... Workshop on Child, Computer and Interaction","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Improving DNN-Based Automatic Recognition of Non-native Children Speech with Adult Speech\",\"authors\":\"Yao Qian, Xinhao Wang, Keelan Evanini, David Suendermann-Oeft\",\"doi\":\"10.21437/WOCCI.2016-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Acoustic models for state-of-the-art DNN-based speech recognition systems are typically trained using at least several hundred hours of task-specific training data. However, this amount of training data is not always available for some applications. In this paper, we investigate how to use an adult speech corpus to improve DNN-based automatic speech recognition for non-native children's speech. Although there are many acoustic and linguistic mismatches between the speech of adults and children, adult speech can still be used to boost the performance of a speech recognizer for children using acoustic modeling techniques based on the DNN framework. The experimental results show that the best recognition performance can be achieved by combining children's training data with adult training data of approximately the same size and initializing the DNN with the weights obtained by pre-training using the full training set of the adult corpus. This system can outperform the baseline system trained on only children's speech with an overall relative WER reduction of 11.9%. Among the three speaking tasks studied, the picture narration task shows the largest gain with a WER reduction from 24.6 % to 20.1%.\",\"PeriodicalId\":91973,\"journal\":{\"name\":\"The ... Workshop on Child, Computer and Interaction\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The ... Workshop on Child, Computer and Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/WOCCI.2016-7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The ... Workshop on Child, Computer and Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/WOCCI.2016-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

最先进的基于dnn的语音识别系统的声学模型通常使用至少数百小时的特定任务训练数据进行训练。然而,对于某些应用程序,这种数量的训练数据并不总是可用的。在本文中,我们研究了如何使用成人语音语料库来改进基于dnn的非母语儿童语音自动识别。尽管成人和儿童的语音之间存在许多声学和语言不匹配,但使用基于深度神经网络框架的声学建模技术,成人语音仍然可以用来提高儿童语音识别器的性能。实验结果表明,将儿童训练数据与大小大致相同的成人训练数据相结合,使用成人语料库的完整训练集预训练得到的权值初始化DNN,可以获得最佳的识别性能。该系统可以比只训练儿童语言的基线系统表现得更好,总体相对WER降低了11.9%。在研究的三个口语任务中,图片叙述任务的增益最大,WER从24.6%下降到20.1%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improving DNN-Based Automatic Recognition of Non-native Children Speech with Adult Speech
Acoustic models for state-of-the-art DNN-based speech recognition systems are typically trained using at least several hundred hours of task-specific training data. However, this amount of training data is not always available for some applications. In this paper, we investigate how to use an adult speech corpus to improve DNN-based automatic speech recognition for non-native children's speech. Although there are many acoustic and linguistic mismatches between the speech of adults and children, adult speech can still be used to boost the performance of a speech recognizer for children using acoustic modeling techniques based on the DNN framework. The experimental results show that the best recognition performance can be achieved by combining children's training data with adult training data of approximately the same size and initializing the DNN with the weights obtained by pre-training using the full training set of the adult corpus. This system can outperform the baseline system trained on only children's speech with an overall relative WER reduction of 11.9%. Among the three speaking tasks studied, the picture narration task shows the largest gain with a WER reduction from 24.6 % to 20.1%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信