{"title":"Multi-speaker Speech Separation under Reverberation Conditions Using Conv-Tasnet","authors":"Chunxi Wang, Maoshen Jia, Yanyan Zhang, Lu Li","doi":"10.12720/jait.14.4.694-700","DOIUrl":null,"url":null,"abstract":"—The goal of speech separation is to separate the target signal from the background interference. With the rapid development of artificial intelligence, speech separation technology combined with deep learning has received more attention as well as a lot of progress. However, in the “cocktail party problem”, it is still a challenge to achieve speech separation under reverberant conditions. In order to solve this problem, a model combining the Weighted Prediction Error (WPE) method and a fully-convolutional time-domain audio separation network (Conv-Tasnet) is proposed in this paper. The model target on separating multi-channel signals after dereverberation without prior knowledge of the second field environment. Subjective and objective evaluation results show that the proposed method outperforms existing methods in the speech separation tasks in reverberant and anechoic environments.","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":"1 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Advances in Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12720/jait.14.4.694-700","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1
Abstract
—The goal of speech separation is to separate the target signal from the background interference. With the rapid development of artificial intelligence, speech separation technology combined with deep learning has received more attention as well as a lot of progress. However, in the “cocktail party problem”, it is still a challenge to achieve speech separation under reverberant conditions. In order to solve this problem, a model combining the Weighted Prediction Error (WPE) method and a fully-convolutional time-domain audio separation network (Conv-Tasnet) is proposed in this paper. The model target on separating multi-channel signals after dereverberation without prior knowledge of the second field environment. Subjective and objective evaluation results show that the proposed method outperforms existing methods in the speech separation tasks in reverberant and anechoic environments.