基于条件生成对抗网络的语音增强蒙古语语音识别

Zhiqiang Ma, Jinyi Li, Junpeng Zhang
{"title":"基于条件生成对抗网络的语音增强蒙古语语音识别","authors":"Zhiqiang Ma, Jinyi Li, Junpeng Zhang","doi":"10.1109/ACAIT56212.2022.10137828","DOIUrl":null,"url":null,"abstract":"Aiming at the problem of uneven regional distribution of speech caused by the lack of labeled data in the Mongolian speech data set, this paper proposes a Mongolian speech data augmentation model based on a conditional generation confrontation network. The model uses conditional speech generators and multiple fusion discriminators for adversarial learning, and uses Mongolian text and specified regional features to generate Mongolian speech with specified regional features. The original data set was augmented by using the methods of speech rate perturbation and spectrogram enhancement, and compared with the end-to-end Mongolian speech recognition model trained on different augment data sets and the original data sets, it was found that the word error rate in the end-to-end Mongolian speech recognition model trained on the augment data set of the specified regional characteristics is 3.1%; Compared with the end-to-end Mongolian speech recognition model trained on the original data set, the speech rate disturbance data set, and the spectrogram enhancement data set, the word error rate dropped by 2%, 0.5%, and 0.8%.","PeriodicalId":398228,"journal":{"name":"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speech Augmentation Using Conditional Generative Adversarial Nets in Mongolian Speech Recognition\",\"authors\":\"Zhiqiang Ma, Jinyi Li, Junpeng Zhang\",\"doi\":\"10.1109/ACAIT56212.2022.10137828\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aiming at the problem of uneven regional distribution of speech caused by the lack of labeled data in the Mongolian speech data set, this paper proposes a Mongolian speech data augmentation model based on a conditional generation confrontation network. The model uses conditional speech generators and multiple fusion discriminators for adversarial learning, and uses Mongolian text and specified regional features to generate Mongolian speech with specified regional features. The original data set was augmented by using the methods of speech rate perturbation and spectrogram enhancement, and compared with the end-to-end Mongolian speech recognition model trained on different augment data sets and the original data sets, it was found that the word error rate in the end-to-end Mongolian speech recognition model trained on the augment data set of the specified regional characteristics is 3.1%; Compared with the end-to-end Mongolian speech recognition model trained on the original data set, the speech rate disturbance data set, and the spectrogram enhancement data set, the word error rate dropped by 2%, 0.5%, and 0.8%.\",\"PeriodicalId\":398228,\"journal\":{\"name\":\"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)\",\"volume\":\"77 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACAIT56212.2022.10137828\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACAIT56212.2022.10137828","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

针对蒙古语语音数据集中缺乏标注数据导致语音区域分布不均匀的问题,提出了一种基于条件生成对抗网络的蒙古语语音数据增强模型。该模型使用条件语音生成器和多个融合判别器进行对抗学习,使用蒙古语文本和指定区域特征生成具有指定区域特征的蒙古语语音。采用语音率扰动和谱图增强的方法对原始数据集进行增强,并将基于不同增强数据集训练的端到端蒙古语语音识别模型与原始数据集进行对比,发现基于特定区域特征增强数据集训练的端到端蒙古语语音识别模型的词错误率为3.1%;与在原始数据集、语音率干扰数据集和谱图增强数据集上训练的端到端蒙古语语音识别模型相比,单词错误率分别下降了2%、0.5%和0.8%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Speech Augmentation Using Conditional Generative Adversarial Nets in Mongolian Speech Recognition
Aiming at the problem of uneven regional distribution of speech caused by the lack of labeled data in the Mongolian speech data set, this paper proposes a Mongolian speech data augmentation model based on a conditional generation confrontation network. The model uses conditional speech generators and multiple fusion discriminators for adversarial learning, and uses Mongolian text and specified regional features to generate Mongolian speech with specified regional features. The original data set was augmented by using the methods of speech rate perturbation and spectrogram enhancement, and compared with the end-to-end Mongolian speech recognition model trained on different augment data sets and the original data sets, it was found that the word error rate in the end-to-end Mongolian speech recognition model trained on the augment data set of the specified regional characteristics is 3.1%; Compared with the end-to-end Mongolian speech recognition model trained on the original data set, the speech rate disturbance data set, and the spectrogram enhancement data set, the word error rate dropped by 2%, 0.5%, and 0.8%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信