{"title":"采用低通滤波和时间反转特征序列作为语音增强深度网络数据增强的初步研究","authors":"Che-Wei Liao, Ping-Chen Wu, J. Hung","doi":"10.1109/ISPACS57703.2022.10082819","DOIUrl":null,"url":null,"abstract":"The efficacy of deep neural network (DNN)-based speech enhancement (SE) techniques primarily relies on the amount and versatility of training data. When only a small training set is available, we often exploit data augmentation methods to enlarge the training set to avoid overfitting issues and thus improve the generalization capability of the learned network. In this study, we present two feature-based data augmentation methods in the learning of an SE network. Given the original feature sequences in the training set, we create the corresponding lowpass-filtered sequences with discrete wavelet transform (DWT) and time-reversed sequences. Then these two augmented sequences are used together with the original ones to train the SE network. Preliminary experimental results indicate that the presented data augmentation methods can improve the ideal-ratio-mask (IRM) network by providing the noisy utterances in the test set with a higher perceptual speech Quality(PESQ).","PeriodicalId":410603,"journal":{"name":"2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Preliminary Study of Employing Lowpass-Filtered and Time-Reversed Feature Sequences as Data Augmentation for Speech Enhancement Deep Networks\",\"authors\":\"Che-Wei Liao, Ping-Chen Wu, J. Hung\",\"doi\":\"10.1109/ISPACS57703.2022.10082819\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The efficacy of deep neural network (DNN)-based speech enhancement (SE) techniques primarily relies on the amount and versatility of training data. When only a small training set is available, we often exploit data augmentation methods to enlarge the training set to avoid overfitting issues and thus improve the generalization capability of the learned network. In this study, we present two feature-based data augmentation methods in the learning of an SE network. Given the original feature sequences in the training set, we create the corresponding lowpass-filtered sequences with discrete wavelet transform (DWT) and time-reversed sequences. Then these two augmented sequences are used together with the original ones to train the SE network. Preliminary experimental results indicate that the presented data augmentation methods can improve the ideal-ratio-mask (IRM) network by providing the noisy utterances in the test set with a higher perceptual speech Quality(PESQ).\",\"PeriodicalId\":410603,\"journal\":{\"name\":\"2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPACS57703.2022.10082819\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPACS57703.2022.10082819","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Preliminary Study of Employing Lowpass-Filtered and Time-Reversed Feature Sequences as Data Augmentation for Speech Enhancement Deep Networks
The efficacy of deep neural network (DNN)-based speech enhancement (SE) techniques primarily relies on the amount and versatility of training data. When only a small training set is available, we often exploit data augmentation methods to enlarge the training set to avoid overfitting issues and thus improve the generalization capability of the learned network. In this study, we present two feature-based data augmentation methods in the learning of an SE network. Given the original feature sequences in the training set, we create the corresponding lowpass-filtered sequences with discrete wavelet transform (DWT) and time-reversed sequences. Then these two augmented sequences are used together with the original ones to train the SE network. Preliminary experimental results indicate that the presented data augmentation methods can improve the ideal-ratio-mask (IRM) network by providing the noisy utterances in the test set with a higher perceptual speech Quality(PESQ).