{"title":"Densely Connected Dilated Convolutions with Time-Frequency Attention for Speech Enhancement","authors":"Manaswini Burra, Pavan Kumar Reddy Yerva, Balaji Eemani, Abhinash Sunkara","doi":"10.1109/ICAAIC56838.2023.10140871","DOIUrl":null,"url":null,"abstract":"This research study has proposed a Dilated Dense Time Frequency Attention Autoencoder (DDTFAAEC) model to perform real-time speech enhancement. The proposed model consists of a fully convolutional neural networks with time frequency attention (TFA). TFA blocks have been followed by the convolutional and dense layers in the decoder and encoder. By combining feature reuse, deeper networks, and maximal context aggregation, dense blocks and attention modules are used to assist in the process of feature extraction. TFA mechanism is designed to learn important information with respect to time, channel and frequency in Convolutional Neural Networks (CNN). At different resolutions, the context aggregation is achieved by using the dilated convolutions. To avoid the information flow from future frames, casual convolutions are used, therefore the network will be made applicable for the real-time applications. This research study utilizes the sub-pixel convolutional layers in the decoder for the purpose of upsampling. In terms of quality scores and objective intelligibility, the experimental result outperforms the already used methods.","PeriodicalId":267906,"journal":{"name":"2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC)","volume":"49 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAAIC56838.2023.10140871","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This research study has proposed a Dilated Dense Time Frequency Attention Autoencoder (DDTFAAEC) model to perform real-time speech enhancement. The proposed model consists of a fully convolutional neural networks with time frequency attention (TFA). TFA blocks have been followed by the convolutional and dense layers in the decoder and encoder. By combining feature reuse, deeper networks, and maximal context aggregation, dense blocks and attention modules are used to assist in the process of feature extraction. TFA mechanism is designed to learn important information with respect to time, channel and frequency in Convolutional Neural Networks (CNN). At different resolutions, the context aggregation is achieved by using the dilated convolutions. To avoid the information flow from future frames, casual convolutions are used, therefore the network will be made applicable for the real-time applications. This research study utilizes the sub-pixel convolutional layers in the decoder for the purpose of upsampling. In terms of quality scores and objective intelligibility, the experimental result outperforms the already used methods.