{"title":"用于语音增强的改进型深度学习方法","authors":"Malek Miled, M. B. Ben Messaoud","doi":"10.24840/2183-6493_009-005_001531","DOIUrl":null,"url":null,"abstract":"Single-channel speech enhancement refers to the task of improving the quality and intelligibility of a speech signal in a noisy environment. Time-domain and time-frequency-domain methods are two main categories of approaches for speech enhancement. In this paper, we propose a approach based on a cross-domain framework. This framework utilizes our knowledge of the spectrogram and overcomes some of the limitations faced by time-frequency domain methods. First, we apply the intrinsic mode functions of the empirical mode decomposition and an improved version of principal component analysis. Then, we design a cross-domain learning framework to determine the correlations along the frequency and time axes. At low SNR = -5 dB, the effectiveness of our proposed approach is demonstrated by its performance based on objective and subjective measures. With average scores of -0.49, 2.47, 2.44, and 0.68 for SegSNR, PESQ, Cov, and STOI, respectively. The results highlight the success of our approach in addressing low SNR conditions.","PeriodicalId":36339,"journal":{"name":"U.Porto Journal of Engineering","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An improved deep learning approach for speech enhancement\",\"authors\":\"Malek Miled, M. B. Ben Messaoud\",\"doi\":\"10.24840/2183-6493_009-005_001531\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Single-channel speech enhancement refers to the task of improving the quality and intelligibility of a speech signal in a noisy environment. Time-domain and time-frequency-domain methods are two main categories of approaches for speech enhancement. In this paper, we propose a approach based on a cross-domain framework. This framework utilizes our knowledge of the spectrogram and overcomes some of the limitations faced by time-frequency domain methods. First, we apply the intrinsic mode functions of the empirical mode decomposition and an improved version of principal component analysis. Then, we design a cross-domain learning framework to determine the correlations along the frequency and time axes. At low SNR = -5 dB, the effectiveness of our proposed approach is demonstrated by its performance based on objective and subjective measures. With average scores of -0.49, 2.47, 2.44, and 0.68 for SegSNR, PESQ, Cov, and STOI, respectively. The results highlight the success of our approach in addressing low SNR conditions.\",\"PeriodicalId\":36339,\"journal\":{\"name\":\"U.Porto Journal of Engineering\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"U.Porto Journal of Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24840/2183-6493_009-005_001531\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"U.Porto Journal of Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24840/2183-6493_009-005_001531","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0
摘要
单通道语音增强是指在噪声环境中提高语音信号的质量和可懂度。时域和时频域方法是语音增强的两大类方法。在本文中,我们提出了一种基于跨域框架的方法。该框架利用了我们对频谱图的了解,克服了时频域方法所面临的一些限制。首先,我们应用了经验模式分解的固有模式函数和改进版的主成分分析。然后,我们设计了一个跨域学习框架,以确定沿频率轴和时间轴的相关性。在 SNR = -5 dB 的低信噪比条件下,我们提出的方法的有效性通过其基于客观和主观测量的性能得到了证明。SegSNR、PESQ、Cov 和 STOI 的平均得分分别为 -0.49、2.47、2.44 和 0.68。这些结果凸显了我们的方法在解决低信噪比条件下的成功。
An improved deep learning approach for speech enhancement
Single-channel speech enhancement refers to the task of improving the quality and intelligibility of a speech signal in a noisy environment. Time-domain and time-frequency-domain methods are two main categories of approaches for speech enhancement. In this paper, we propose a approach based on a cross-domain framework. This framework utilizes our knowledge of the spectrogram and overcomes some of the limitations faced by time-frequency domain methods. First, we apply the intrinsic mode functions of the empirical mode decomposition and an improved version of principal component analysis. Then, we design a cross-domain learning framework to determine the correlations along the frequency and time axes. At low SNR = -5 dB, the effectiveness of our proposed approach is demonstrated by its performance based on objective and subjective measures. With average scores of -0.49, 2.47, 2.44, and 0.68 for SegSNR, PESQ, Cov, and STOI, respectively. The results highlight the success of our approach in addressing low SNR conditions.