Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2020-11-11 DOI:10.1109/ISCSLP49672.2021.9362059

Cunhang Fan, B. Liu, J. Tao, Jiangyan Yi, Zhengqi Wen, Leichao Song

{"title":"Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning","authors":"Cunhang Fan, B. Liu, J. Tao, Jiangyan Yi, Zhengqi Wen, Leichao Song","doi":"10.1109/ISCSLP49672.2021.9362059","DOIUrl":null,"url":null,"abstract":"Recurrent neural networks (RNNs) have shown significant improvements in recent years for speech enhancement. However, the model complexity and inference time cost of RNNs are much higher than deep feed-forward neural networks (DNNs). Therefore, these limit the applications of speech enhancement. This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning. The TDNN has excellent potential for capturing long range temporal contexts, which utilizes a modular and incremental design. Besides, the TDNN preserves the feed-forward structure so that its inference cost is comparable to standard DNN. To make full use of the training data, we propose a full data learning method for speech enhancement. More specifically, we not only use the noisy-to-clean (input-to-target) to train the enhanced model, but also the clean-to-clean and noise-to-silence data. Therefore, all of the training data can be used to train the enhanced model. Our experiments are conducted on TIMIT dataset. Experimental results show that our proposed method could achieve a better performance than DNN and comparable even better performance than BLSTM. Meanwhile, compared with the BLSTM, the proposed method drastically reduce the inference time.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP49672.2021.9362059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Recurrent neural networks (RNNs) have shown significant improvements in recent years for speech enhancement. However, the model complexity and inference time cost of RNNs are much higher than deep feed-forward neural networks (DNNs). Therefore, these limit the applications of speech enhancement. This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning. The TDNN has excellent potential for capturing long range temporal contexts, which utilizes a modular and incremental design. Besides, the TDNN preserves the feed-forward structure so that its inference cost is comparable to standard DNN. To make full use of the training data, we propose a full data learning method for speech enhancement. More specifically, we not only use the noisy-to-clean (input-to-target) to train the enhanced model, but also the clean-to-clean and noise-to-silence data. Therefore, all of the training data can be used to train the enhanced model. Our experiments are conducted on TIMIT dataset. Experimental results show that our proposed method could achieve a better performance than DNN and comparable even better performance than BLSTM. Meanwhile, compared with the BLSTM, the proposed method drastically reduce the inference time.

查看原文本刊更多论文

基于全数据学习的语音增强深度时滞神经网络

近年来，递归神经网络(RNNs)在语音增强方面取得了显著进展。然而，rnn的模型复杂度和推理时间成本远高于深度前馈神经网络(dnn)。因此，这些限制了语音增强的应用。本文提出了一种用于全数据学习语音增强的深度时延神经网络(TDNN)。TDNN采用模块化和增量式设计，具有捕获远距离时间上下文的良好潜力。此外，TDNN保留了前馈结构，因此其推理成本与标准DNN相当。为了充分利用训练数据，我们提出了一种全数据学习的语音增强方法。更具体地说，我们不仅使用噪声到清洁(输入到目标)来训练增强模型，而且还使用清洁到清洁和噪声到沉默数据。因此，所有的训练数据都可以用来训练增强模型。我们的实验在TIMIT数据集上进行。实验结果表明，我们的方法可以达到比DNN更好的性能，甚至比BLSTM更好的性能。同时，与BLSTM相比，该方法大大缩短了推理时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)

自引率

0.00%

发文量