基于频谱图的空间注意力生成对抗网络语音增强

International Conference on Digital Image Processing Pub Date : 2022-10-12 DOI:10.1117/12.2644385

Haixin Luo, Shengyu Lu, Qian Wei, Yu Fu, Jindong Tian

{"title":"基于频谱图的空间注意力生成对抗网络语音增强","authors":"Haixin Luo, Shengyu Lu, Qian Wei, Yu Fu, Jindong Tian","doi":"10.1117/12.2644385","DOIUrl":null,"url":null,"abstract":"The spectrogram can clearly show the composition of different frequencies in the speech signal. In this paper, a speech enhancement method based on deep learning image processing is proposed, which optimizes the spectrogram of the laser detected speech signal to achieve speech enhancement. The laser beam emitted by the laser Doppler vibrometer (LDV) is focused on the glass window to detect the vibration caused by sound wave. After conversion, the audio information that causes vibration is obtained. Under the interference of speckle noise and air disturbance, the detected speech signal not only has a low signal-to-noise ratio (SNR) but also has non-stationary noise. In order to overcome the difficulty that traditional methods are difficult to extract weak signals in the case of severe noise interference, we use deep learning to achieve spectrogram noise reduction and speech information enhancement. By processing the spectrogram of noisy speech with the generative adversarial networks (GAN) combined with the spatial attention mechanism and introducing the short-time objective intelligibility (STOI) into the loss function, the laser detected speech signal was successfully enhanced.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spectrogram-based speech enhancement by spatial attention generative adversarial networks\",\"authors\":\"Haixin Luo, Shengyu Lu, Qian Wei, Yu Fu, Jindong Tian\",\"doi\":\"10.1117/12.2644385\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The spectrogram can clearly show the composition of different frequencies in the speech signal. In this paper, a speech enhancement method based on deep learning image processing is proposed, which optimizes the spectrogram of the laser detected speech signal to achieve speech enhancement. The laser beam emitted by the laser Doppler vibrometer (LDV) is focused on the glass window to detect the vibration caused by sound wave. After conversion, the audio information that causes vibration is obtained. Under the interference of speckle noise and air disturbance, the detected speech signal not only has a low signal-to-noise ratio (SNR) but also has non-stationary noise. In order to overcome the difficulty that traditional methods are difficult to extract weak signals in the case of severe noise interference, we use deep learning to achieve spectrogram noise reduction and speech information enhancement. By processing the spectrogram of noisy speech with the generative adversarial networks (GAN) combined with the spatial attention mechanism and introducing the short-time objective intelligibility (STOI) into the loss function, the laser detected speech signal was successfully enhanced.\",\"PeriodicalId\":314555,\"journal\":{\"name\":\"International Conference on Digital Image Processing\",\"volume\":\"89 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Digital Image Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2644385\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Digital Image Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2644385","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

频谱图可以清晰地显示语音信号中不同频率的组成。本文提出了一种基于深度学习图像处理的语音增强方法，该方法对激光检测语音信号的频谱图进行优化，实现语音增强。激光多普勒测振仪(LDV)发射的激光束聚焦在玻璃窗上，检测声波引起的振动。转换后，得到引起振动的音频信息。在散斑噪声和空气扰动的干扰下，检测到的语音信号不仅信噪比低，而且具有非平稳噪声。为了克服传统方法在噪声干扰严重的情况下难以提取微弱信号的困难，我们利用深度学习实现谱图降噪和语音信息增强。通过结合空间注意机制的生成对抗网络(GAN)对噪声语音的频谱图进行处理，并在损失函数中引入短时客观可理解度(STOI)，成功增强了激光检测语音信号。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Spectrogram-based speech enhancement by spatial attention generative adversarial networks

The spectrogram can clearly show the composition of different frequencies in the speech signal. In this paper, a speech enhancement method based on deep learning image processing is proposed, which optimizes the spectrogram of the laser detected speech signal to achieve speech enhancement. The laser beam emitted by the laser Doppler vibrometer (LDV) is focused on the glass window to detect the vibration caused by sound wave. After conversion, the audio information that causes vibration is obtained. Under the interference of speckle noise and air disturbance, the detected speech signal not only has a low signal-to-noise ratio (SNR) but also has non-stationary noise. In order to overcome the difficulty that traditional methods are difficult to extract weak signals in the case of severe noise interference, we use deep learning to achieve spectrogram noise reduction and speech information enhancement. By processing the spectrogram of noisy speech with the generative adversarial networks (GAN) combined with the spatial attention mechanism and introducing the short-time objective intelligibility (STOI) into the loss function, the laser detected speech signal was successfully enhanced.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Conference on Digital Image Processing

自引率

0.00%

发文量