基于cnn的神经声码器语音去噪比较

2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC) Pub Date : 2021-04-13 DOI:10.1109/ICAIIC51459.2021.9415259

Chanjun Chun, Kwang Myung Jeon, Chaejun Leem, Bumshik Lee, Wooyeol Choi

{"title":"基于cnn的神经声码器语音去噪比较","authors":"Chanjun Chun, Kwang Myung Jeon, Chaejun Leem, Bumshik Lee, Wooyeol Choi","doi":"10.1109/ICAIIC51459.2021.9415259","DOIUrl":null,"url":null,"abstract":"Reverberation degrades the speech quality and intelligibility, particularly for hearing impaired people. In an automatic speech recognition (ASR) system, a dereverberation technique, which removes reverberation, is widely employed as a pre-processing to increase the performance of the ASR system. In this paper, we compare the performance of the CNN-based dereverberation method by applying various vocoders. The U-Net architecture is employed as the dereverberation technique. WaveGlow, MelGAN, and Griffin Lim are used as vocoders. Such vocoders play a role in converting speech features into speech samples in time domain, and are capable of generating high-quality speech from mel-spectrograms. In order to compare the results, PESQ was measured. As a result, it was confirmed that PESQ was higher than that of the reverberant speech when speech was synthesized with the reverberation removal and vocoder.","PeriodicalId":432977,"journal":{"name":"2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Comparison of CNN-based Speech Dereverberation using Neural Vocoder\",\"authors\":\"Chanjun Chun, Kwang Myung Jeon, Chaejun Leem, Bumshik Lee, Wooyeol Choi\",\"doi\":\"10.1109/ICAIIC51459.2021.9415259\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reverberation degrades the speech quality and intelligibility, particularly for hearing impaired people. In an automatic speech recognition (ASR) system, a dereverberation technique, which removes reverberation, is widely employed as a pre-processing to increase the performance of the ASR system. In this paper, we compare the performance of the CNN-based dereverberation method by applying various vocoders. The U-Net architecture is employed as the dereverberation technique. WaveGlow, MelGAN, and Griffin Lim are used as vocoders. Such vocoders play a role in converting speech features into speech samples in time domain, and are capable of generating high-quality speech from mel-spectrograms. In order to compare the results, PESQ was measured. As a result, it was confirmed that PESQ was higher than that of the reverberant speech when speech was synthesized with the reverberation removal and vocoder.\",\"PeriodicalId\":432977,\"journal\":{\"name\":\"2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAIIC51459.2021.9415259\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAIIC51459.2021.9415259","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

混响会降低语音质量和清晰度，对听力受损的人来说尤其如此。在自动语音识别(ASR)系统中，一种消除混响的去混响技术被广泛地用作预处理，以提高自动语音识别系统的性能。在本文中，我们通过使用不同的声码器来比较基于cnn的去噪方法的性能。采用U-Net体系结构作为消噪技术。使用WaveGlow, MelGAN和Griffin Lim作为声码器。这种声码器可以在时域内将语音特征转换为语音样本，并能够从梅尔谱图中生成高质量的语音。为了比较结果，测量了PESQ。结果证实，当使用混响去除和声码器合成语音时，PESQ高于混响语音。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparison of CNN-based Speech Dereverberation using Neural Vocoder

Reverberation degrades the speech quality and intelligibility, particularly for hearing impaired people. In an automatic speech recognition (ASR) system, a dereverberation technique, which removes reverberation, is widely employed as a pre-processing to increase the performance of the ASR system. In this paper, we compare the performance of the CNN-based dereverberation method by applying various vocoders. The U-Net architecture is employed as the dereverberation technique. WaveGlow, MelGAN, and Griffin Lim are used as vocoders. Such vocoders play a role in converting speech features into speech samples in time domain, and are capable of generating high-quality speech from mel-spectrograms. In order to compare the results, PESQ was measured. As a result, it was confirmed that PESQ was higher than that of the reverberant speech when speech was synthesized with the reverberation removal and vocoder.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)

自引率

0.00%

发文量