Jia-Wei Chen, Jia-Hui Li, Yi-Hao Jiang, Yi-Chang Wu, Ying-Hui Lai
{"title":"通过物理数据增强增强光学麦克风系统的语音清晰度。","authors":"Jia-Wei Chen, Jia-Hui Li, Yi-Hao Jiang, Yi-Chang Wu, Ying-Hui Lai","doi":"10.1121/10.0036356","DOIUrl":null,"url":null,"abstract":"<p><p>Laser doppler vibrometers (LDVs) facilitate noncontact speech acquisition; however, they are prone to material-dependent spectral distortions and speckle noise, which degrade intelligibility in noisy environments. This study proposes a data augmentation method that incorporates material-specific and impulse noises to simulate LDV-induced distortions. The proposed approach utilizes a gated convolutional neural network with HiFi-GAN to enhance speech intelligibility across various material and low signal-to-noise ratio (SNR) conditions, achieving a short-time objective intelligibility score of 0.76 at 0 dB SNR. These findings provide valuable insights into optimized augmentation and deep-learning techniques for enhancing LDV-based speech recordings in practical applications.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 4","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing speech intelligibility in optical microphone systems through physics-informed data augmentation.\",\"authors\":\"Jia-Wei Chen, Jia-Hui Li, Yi-Hao Jiang, Yi-Chang Wu, Ying-Hui Lai\",\"doi\":\"10.1121/10.0036356\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Laser doppler vibrometers (LDVs) facilitate noncontact speech acquisition; however, they are prone to material-dependent spectral distortions and speckle noise, which degrade intelligibility in noisy environments. This study proposes a data augmentation method that incorporates material-specific and impulse noises to simulate LDV-induced distortions. The proposed approach utilizes a gated convolutional neural network with HiFi-GAN to enhance speech intelligibility across various material and low signal-to-noise ratio (SNR) conditions, achieving a short-time objective intelligibility score of 0.76 at 0 dB SNR. These findings provide valuable insights into optimized augmentation and deep-learning techniques for enhancing LDV-based speech recordings in practical applications.</p>\",\"PeriodicalId\":73538,\"journal\":{\"name\":\"JASA express letters\",\"volume\":\"5 4\",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JASA express letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1121/10.0036356\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JASA express letters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1121/10.0036356","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
摘要
激光多普勒测振仪(ldv)促进非接触语音采集;然而,它们容易产生与材料相关的光谱失真和散斑噪声,从而降低噪声环境中的可理解性。本研究提出了一种数据增强方法,该方法结合了材料特异性和脉冲噪声来模拟ldv引起的扭曲。该方法利用带有HiFi-GAN的门控卷积神经网络来提高各种材料和低信噪比(SNR)条件下的语音可理解性,在0 dB SNR下实现了0.76的短期客观可理解性评分。这些发现为优化增强和深度学习技术在实际应用中增强基于ldv的语音记录提供了有价值的见解。
Enhancing speech intelligibility in optical microphone systems through physics-informed data augmentation.
Laser doppler vibrometers (LDVs) facilitate noncontact speech acquisition; however, they are prone to material-dependent spectral distortions and speckle noise, which degrade intelligibility in noisy environments. This study proposes a data augmentation method that incorporates material-specific and impulse noises to simulate LDV-induced distortions. The proposed approach utilizes a gated convolutional neural network with HiFi-GAN to enhance speech intelligibility across various material and low signal-to-noise ratio (SNR) conditions, achieving a short-time objective intelligibility score of 0.76 at 0 dB SNR. These findings provide valuable insights into optimized augmentation and deep-learning techniques for enhancing LDV-based speech recordings in practical applications.