{"title":"Spectrogram-based speech enhancement by spatial attention generative adversarial networks","authors":"Haixin Luo, Shengyu Lu, Qian Wei, Yu Fu, Jindong Tian","doi":"10.1117/12.2644385","DOIUrl":null,"url":null,"abstract":"The spectrogram can clearly show the composition of different frequencies in the speech signal. In this paper, a speech enhancement method based on deep learning image processing is proposed, which optimizes the spectrogram of the laser detected speech signal to achieve speech enhancement. The laser beam emitted by the laser Doppler vibrometer (LDV) is focused on the glass window to detect the vibration caused by sound wave. After conversion, the audio information that causes vibration is obtained. Under the interference of speckle noise and air disturbance, the detected speech signal not only has a low signal-to-noise ratio (SNR) but also has non-stationary noise. In order to overcome the difficulty that traditional methods are difficult to extract weak signals in the case of severe noise interference, we use deep learning to achieve spectrogram noise reduction and speech information enhancement. By processing the spectrogram of noisy speech with the generative adversarial networks (GAN) combined with the spatial attention mechanism and introducing the short-time objective intelligibility (STOI) into the loss function, the laser detected speech signal was successfully enhanced.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Digital Image Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2644385","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The spectrogram can clearly show the composition of different frequencies in the speech signal. In this paper, a speech enhancement method based on deep learning image processing is proposed, which optimizes the spectrogram of the laser detected speech signal to achieve speech enhancement. The laser beam emitted by the laser Doppler vibrometer (LDV) is focused on the glass window to detect the vibration caused by sound wave. After conversion, the audio information that causes vibration is obtained. Under the interference of speckle noise and air disturbance, the detected speech signal not only has a low signal-to-noise ratio (SNR) but also has non-stationary noise. In order to overcome the difficulty that traditional methods are difficult to extract weak signals in the case of severe noise interference, we use deep learning to achieve spectrogram noise reduction and speech information enhancement. By processing the spectrogram of noisy speech with the generative adversarial networks (GAN) combined with the spatial attention mechanism and introducing the short-time objective intelligibility (STOI) into the loss function, the laser detected speech signal was successfully enhanced.