基于条件生成对抗网络的语音增强蒙古语语音识别

2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT) Pub Date : 2022-12-09 DOI:10.1109/ACAIT56212.2022.10137828

Zhiqiang Ma, Jinyi Li, Junpeng Zhang

{"title":"基于条件生成对抗网络的语音增强蒙古语语音识别","authors":"Zhiqiang Ma, Jinyi Li, Junpeng Zhang","doi":"10.1109/ACAIT56212.2022.10137828","DOIUrl":null,"url":null,"abstract":"Aiming at the problem of uneven regional distribution of speech caused by the lack of labeled data in the Mongolian speech data set, this paper proposes a Mongolian speech data augmentation model based on a conditional generation confrontation network. The model uses conditional speech generators and multiple fusion discriminators for adversarial learning, and uses Mongolian text and specified regional features to generate Mongolian speech with specified regional features. The original data set was augmented by using the methods of speech rate perturbation and spectrogram enhancement, and compared with the end-to-end Mongolian speech recognition model trained on different augment data sets and the original data sets, it was found that the word error rate in the end-to-end Mongolian speech recognition model trained on the augment data set of the specified regional characteristics is 3.1%; Compared with the end-to-end Mongolian speech recognition model trained on the original data set, the speech rate disturbance data set, and the spectrogram enhancement data set, the word error rate dropped by 2%, 0.5%, and 0.8%.","PeriodicalId":398228,"journal":{"name":"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speech Augmentation Using Conditional Generative Adversarial Nets in Mongolian Speech Recognition\",\"authors\":\"Zhiqiang Ma, Jinyi Li, Junpeng Zhang\",\"doi\":\"10.1109/ACAIT56212.2022.10137828\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aiming at the problem of uneven regional distribution of speech caused by the lack of labeled data in the Mongolian speech data set, this paper proposes a Mongolian speech data augmentation model based on a conditional generation confrontation network. The model uses conditional speech generators and multiple fusion discriminators for adversarial learning, and uses Mongolian text and specified regional features to generate Mongolian speech with specified regional features. The original data set was augmented by using the methods of speech rate perturbation and spectrogram enhancement, and compared with the end-to-end Mongolian speech recognition model trained on different augment data sets and the original data sets, it was found that the word error rate in the end-to-end Mongolian speech recognition model trained on the augment data set of the specified regional characteristics is 3.1%; Compared with the end-to-end Mongolian speech recognition model trained on the original data set, the speech rate disturbance data set, and the spectrogram enhancement data set, the word error rate dropped by 2%, 0.5%, and 0.8%.\",\"PeriodicalId\":398228,\"journal\":{\"name\":\"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)\",\"volume\":\"77 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACAIT56212.2022.10137828\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACAIT56212.2022.10137828","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

针对蒙古语语音数据集中缺乏标注数据导致语音区域分布不均匀的问题，提出了一种基于条件生成对抗网络的蒙古语语音数据增强模型。该模型使用条件语音生成器和多个融合判别器进行对抗学习，使用蒙古语文本和指定区域特征生成具有指定区域特征的蒙古语语音。采用语音率扰动和谱图增强的方法对原始数据集进行增强，并将基于不同增强数据集训练的端到端蒙古语语音识别模型与原始数据集进行对比，发现基于特定区域特征增强数据集训练的端到端蒙古语语音识别模型的词错误率为3.1%;与在原始数据集、语音率干扰数据集和谱图增强数据集上训练的端到端蒙古语语音识别模型相比，单词错误率分别下降了2%、0.5%和0.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Speech Augmentation Using Conditional Generative Adversarial Nets in Mongolian Speech Recognition

Aiming at the problem of uneven regional distribution of speech caused by the lack of labeled data in the Mongolian speech data set, this paper proposes a Mongolian speech data augmentation model based on a conditional generation confrontation network. The model uses conditional speech generators and multiple fusion discriminators for adversarial learning, and uses Mongolian text and specified regional features to generate Mongolian speech with specified regional features. The original data set was augmented by using the methods of speech rate perturbation and spectrogram enhancement, and compared with the end-to-end Mongolian speech recognition model trained on different augment data sets and the original data sets, it was found that the word error rate in the end-to-end Mongolian speech recognition model trained on the augment data set of the specified regional characteristics is 3.1%; Compared with the end-to-end Mongolian speech recognition model trained on the original data set, the speech rate disturbance data set, and the spectrogram enhancement data set, the word error rate dropped by 2%, 0.5%, and 0.8%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)

自引率

0.00%

发文量