{"title":"基于功率谱的多通道字典学习语音增强","authors":"Tongzheng Ni, Junfeng Wei, Jiarong Wu, Lanfang Zhang, Weidong Tang","doi":"10.1117/12.2680516","DOIUrl":null,"url":null,"abstract":"Algorithms that model and estimate noise based on statistical properties, such as spectral subtraction, can estimate the distribution of stationary noise, but their performance degrades when suppressing non-stationary noise. Dictionary learning and sparse representation algorithms have made great achievements in solving non-stationary noise suppression. However, the multi-channel speech enhancement algorithm based on dictionary learning needs to manually estimate the parameters of spectrum reduction threshold in practice. In order to obtain optimized noise reduction results, the adaptive estimation of spectrum reduction threshold is of great significance. According to the power spectrum of the signal, the algorithm of spectral subtraction threshold is defined and the spectral subtraction threshold is used to optimize and enhance the quality of speech. The experimental comparison shows that the spectral reduction threshold calculated based on the power spectrum is closer to the optimal result compared with the fixed threshold. In the -10dB noise environment, the multichannel dictionary learning algorithm based on improved power spectrum improves the segmental signal-to-noise ratio by 1-2dB compared with spectral subtraction and non-negative matrix decomposition, and improves the perceived speech quality assessment and short-term intelligibility by an average of 2.3 and 0.11 points respectively. The experimental results show that the multi-channel dictionary learning algorithm based on the improved power spectrum can effectively remove additive noise under both unsteady and steady state noise conditions.","PeriodicalId":201466,"journal":{"name":"Symposium on Advances in Electrical, Electronics and Computer Engineering","volume":"12704 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-channel dictionary learning speech enhancement based on power spectrum\",\"authors\":\"Tongzheng Ni, Junfeng Wei, Jiarong Wu, Lanfang Zhang, Weidong Tang\",\"doi\":\"10.1117/12.2680516\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Algorithms that model and estimate noise based on statistical properties, such as spectral subtraction, can estimate the distribution of stationary noise, but their performance degrades when suppressing non-stationary noise. Dictionary learning and sparse representation algorithms have made great achievements in solving non-stationary noise suppression. However, the multi-channel speech enhancement algorithm based on dictionary learning needs to manually estimate the parameters of spectrum reduction threshold in practice. In order to obtain optimized noise reduction results, the adaptive estimation of spectrum reduction threshold is of great significance. According to the power spectrum of the signal, the algorithm of spectral subtraction threshold is defined and the spectral subtraction threshold is used to optimize and enhance the quality of speech. The experimental comparison shows that the spectral reduction threshold calculated based on the power spectrum is closer to the optimal result compared with the fixed threshold. In the -10dB noise environment, the multichannel dictionary learning algorithm based on improved power spectrum improves the segmental signal-to-noise ratio by 1-2dB compared with spectral subtraction and non-negative matrix decomposition, and improves the perceived speech quality assessment and short-term intelligibility by an average of 2.3 and 0.11 points respectively. The experimental results show that the multi-channel dictionary learning algorithm based on the improved power spectrum can effectively remove additive noise under both unsteady and steady state noise conditions.\",\"PeriodicalId\":201466,\"journal\":{\"name\":\"Symposium on Advances in Electrical, Electronics and Computer Engineering\",\"volume\":\"12704 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Symposium on Advances in Electrical, Electronics and Computer Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2680516\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Symposium on Advances in Electrical, Electronics and Computer Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2680516","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-channel dictionary learning speech enhancement based on power spectrum
Algorithms that model and estimate noise based on statistical properties, such as spectral subtraction, can estimate the distribution of stationary noise, but their performance degrades when suppressing non-stationary noise. Dictionary learning and sparse representation algorithms have made great achievements in solving non-stationary noise suppression. However, the multi-channel speech enhancement algorithm based on dictionary learning needs to manually estimate the parameters of spectrum reduction threshold in practice. In order to obtain optimized noise reduction results, the adaptive estimation of spectrum reduction threshold is of great significance. According to the power spectrum of the signal, the algorithm of spectral subtraction threshold is defined and the spectral subtraction threshold is used to optimize and enhance the quality of speech. The experimental comparison shows that the spectral reduction threshold calculated based on the power spectrum is closer to the optimal result compared with the fixed threshold. In the -10dB noise environment, the multichannel dictionary learning algorithm based on improved power spectrum improves the segmental signal-to-noise ratio by 1-2dB compared with spectral subtraction and non-negative matrix decomposition, and improves the perceived speech quality assessment and short-term intelligibility by an average of 2.3 and 0.11 points respectively. The experimental results show that the multi-channel dictionary learning algorithm based on the improved power spectrum can effectively remove additive noise under both unsteady and steady state noise conditions.