{"title":"A ChannelWise weighting technique of slice-based Temporal Convolutional Network for noisy speech enhancement","authors":"Wei-Tyng Hong, Kuldeep Singh Rana","doi":"10.1016/j.csl.2023.101572","DOIUrl":null,"url":null,"abstract":"<div><p><span>In recent years, Temporal Convolutional Networks<span> (TCNs) have driven significant progress in single-channel noisy speech enhancement. However, TCN-based systems still face certain challenges, such as limited utilization of network channel depth for handling long-range dependencies and issues with weight sharing. To address these challenges, this paper proposes a novel channel-wise weighting scheme, specifically designed for the sliced TCN framework. The proposed scheme involves the element-wise multiplication of shifting weight techniques for each channel of the TCN slice. Utilizing a cyclically shifted approach, these weights capture information from neighboring channels, uncovering the dependencies between adjacent channels. By combining the channel-wise weighted TCN output and subsequently estimating a masking function, the proposed method effectively suppresses noise components, leading to enhanced speech quality. To train and evaluate our proposed method, we utilize speech datasets that consist of various noise types at different levels. To optimize the performance of the proposed end-to-end enhancement system, we adopt the Scale-Invariant Signal-to-Noise Ratio (SI-SNR) objective function. Experimental results demonstrate the effectiveness of our proposed TCN channel-wise weighting method, with a significant average improvement of approximately 9.8% in SI-SNR for the unseen noise dataset. This improvement was observed at an SNR of </span></span><span><math><mo>−</mo></math></span>3 dB for both non-channel-wise weighting schemes and the proposed channel-wise weighting schemes within the Multi-slicing TCNs framework. The main advantage of the proposed approach is its ability to address the challenges of uneven and biased output from TCN slices, particularly when dealing with highly non-stationary, noisy speech signals infused with speech-like noise. This leads to more robust performance in various real-world applications.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230823000918","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, Temporal Convolutional Networks (TCNs) have driven significant progress in single-channel noisy speech enhancement. However, TCN-based systems still face certain challenges, such as limited utilization of network channel depth for handling long-range dependencies and issues with weight sharing. To address these challenges, this paper proposes a novel channel-wise weighting scheme, specifically designed for the sliced TCN framework. The proposed scheme involves the element-wise multiplication of shifting weight techniques for each channel of the TCN slice. Utilizing a cyclically shifted approach, these weights capture information from neighboring channels, uncovering the dependencies between adjacent channels. By combining the channel-wise weighted TCN output and subsequently estimating a masking function, the proposed method effectively suppresses noise components, leading to enhanced speech quality. To train and evaluate our proposed method, we utilize speech datasets that consist of various noise types at different levels. To optimize the performance of the proposed end-to-end enhancement system, we adopt the Scale-Invariant Signal-to-Noise Ratio (SI-SNR) objective function. Experimental results demonstrate the effectiveness of our proposed TCN channel-wise weighting method, with a significant average improvement of approximately 9.8% in SI-SNR for the unseen noise dataset. This improvement was observed at an SNR of 3 dB for both non-channel-wise weighting schemes and the proposed channel-wise weighting schemes within the Multi-slicing TCNs framework. The main advantage of the proposed approach is its ability to address the challenges of uneven and biased output from TCN slices, particularly when dealing with highly non-stationary, noisy speech signals infused with speech-like noise. This leads to more robust performance in various real-world applications.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.