{"title":"一种基于swing - unet模型的语音增强新方法","authors":"Chengli Sun, Weiqi Jiang, Y. Leng, Feilong Chen","doi":"10.3397/1/377122","DOIUrl":null,"url":null,"abstract":"U-shaped Network (UNet) has shown excellent performance in a variety of speech enhancement tasks. However, because of the intrinsic limitation of convolutional operation, traditional UNet built with convolutional neural network (CNN) cannot learn global and long-term information well.\n In this work, we propose a new Swin-UNet-based speech enhancement method. Unlike the traditional UNet model, the CNN blocks are all replaced with Swin-Transformer blocks to explore more multi-scale contextual information. The Swin-UNet model employs shifted window mechanism which not only\n overcomes the defect of high computational complexity of the Transformer but also enhances global information interaction by utilizing the powerful global modeling capability of the Transformer. Through hierarchical Swin-Transformer blocks, global and local speech features can be fully leveraged\n to improve speech reconstruction ability. Experimental results confirm that the proposed method can eliminate more background noise while maintaining good objective speech quality.","PeriodicalId":49748,"journal":{"name":"Noise Control Engineering Journal","volume":" ","pages":""},"PeriodicalIF":0.3000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A new speech enhancement method based on Swin-UNet model\",\"authors\":\"Chengli Sun, Weiqi Jiang, Y. Leng, Feilong Chen\",\"doi\":\"10.3397/1/377122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"U-shaped Network (UNet) has shown excellent performance in a variety of speech enhancement tasks. However, because of the intrinsic limitation of convolutional operation, traditional UNet built with convolutional neural network (CNN) cannot learn global and long-term information well.\\n In this work, we propose a new Swin-UNet-based speech enhancement method. Unlike the traditional UNet model, the CNN blocks are all replaced with Swin-Transformer blocks to explore more multi-scale contextual information. The Swin-UNet model employs shifted window mechanism which not only\\n overcomes the defect of high computational complexity of the Transformer but also enhances global information interaction by utilizing the powerful global modeling capability of the Transformer. Through hierarchical Swin-Transformer blocks, global and local speech features can be fully leveraged\\n to improve speech reconstruction ability. Experimental results confirm that the proposed method can eliminate more background noise while maintaining good objective speech quality.\",\"PeriodicalId\":49748,\"journal\":{\"name\":\"Noise Control Engineering Journal\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.3000,\"publicationDate\":\"2023-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Noise Control Engineering Journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.3397/1/377122\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Noise Control Engineering Journal","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3397/1/377122","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ACOUSTICS","Score":null,"Total":0}
A new speech enhancement method based on Swin-UNet model
U-shaped Network (UNet) has shown excellent performance in a variety of speech enhancement tasks. However, because of the intrinsic limitation of convolutional operation, traditional UNet built with convolutional neural network (CNN) cannot learn global and long-term information well.
In this work, we propose a new Swin-UNet-based speech enhancement method. Unlike the traditional UNet model, the CNN blocks are all replaced with Swin-Transformer blocks to explore more multi-scale contextual information. The Swin-UNet model employs shifted window mechanism which not only
overcomes the defect of high computational complexity of the Transformer but also enhances global information interaction by utilizing the powerful global modeling capability of the Transformer. Through hierarchical Swin-Transformer blocks, global and local speech features can be fully leveraged
to improve speech reconstruction ability. Experimental results confirm that the proposed method can eliminate more background noise while maintaining good objective speech quality.
期刊介绍:
NCEJ is the pre-eminent academic journal of noise control. It is the International Journal of the Institute of Noise Control Engineering of the USA. It is also produced with the participation and assistance of the Korean Society of Noise and Vibration Engineering (KSNVE).
NCEJ reaches noise control professionals around the world, covering over 50 national noise control societies and institutes.
INCE encourages you to submit your next paper to NCEJ. Choosing NCEJ:
Provides the opportunity to reach a global audience of NCE professionals, academics, and students;
Enhances the prestige of your work;
Validates your work by formal peer review.