{"title":"A Preliminary Study of Employing Lowpass-Filtered and Time-Reversed Feature Sequences as Data Augmentation for Speech Enhancement Deep Networks","authors":"Che-Wei Liao, Ping-Chen Wu, J. Hung","doi":"10.1109/ISPACS57703.2022.10082819","DOIUrl":null,"url":null,"abstract":"The efficacy of deep neural network (DNN)-based speech enhancement (SE) techniques primarily relies on the amount and versatility of training data. When only a small training set is available, we often exploit data augmentation methods to enlarge the training set to avoid overfitting issues and thus improve the generalization capability of the learned network. In this study, we present two feature-based data augmentation methods in the learning of an SE network. Given the original feature sequences in the training set, we create the corresponding lowpass-filtered sequences with discrete wavelet transform (DWT) and time-reversed sequences. Then these two augmented sequences are used together with the original ones to train the SE network. Preliminary experimental results indicate that the presented data augmentation methods can improve the ideal-ratio-mask (IRM) network by providing the noisy utterances in the test set with a higher perceptual speech Quality(PESQ).","PeriodicalId":410603,"journal":{"name":"2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPACS57703.2022.10082819","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The efficacy of deep neural network (DNN)-based speech enhancement (SE) techniques primarily relies on the amount and versatility of training data. When only a small training set is available, we often exploit data augmentation methods to enlarge the training set to avoid overfitting issues and thus improve the generalization capability of the learned network. In this study, we present two feature-based data augmentation methods in the learning of an SE network. Given the original feature sequences in the training set, we create the corresponding lowpass-filtered sequences with discrete wavelet transform (DWT) and time-reversed sequences. Then these two augmented sequences are used together with the original ones to train the SE network. Preliminary experimental results indicate that the presented data augmentation methods can improve the ideal-ratio-mask (IRM) network by providing the noisy utterances in the test set with a higher perceptual speech Quality(PESQ).