通过物理数据增强增强光学麦克风系统的语音清晰度。

IF 1.4 Q3 ACOUSTICS

JASA express letters Pub Date : 2025-04-01 DOI:10.1121/10.0036356

Jia-Wei Chen, Jia-Hui Li, Yi-Hao Jiang, Yi-Chang Wu, Ying-Hui Lai

{"title":"通过物理数据增强增强光学麦克风系统的语音清晰度。","authors":"Jia-Wei Chen, Jia-Hui Li, Yi-Hao Jiang, Yi-Chang Wu, Ying-Hui Lai","doi":"10.1121/10.0036356","DOIUrl":null,"url":null,"abstract":"Laser doppler vibrometers (LDVs) facilitate noncontact speech acquisition; however, they are prone to material-dependent spectral distortions and speckle noise, which degrade intelligibility in noisy environments. This study proposes a data augmentation method that incorporates material-specific and impulse noises to simulate LDV-induced distortions. The proposed approach utilizes a gated convolutional neural network with HiFi-GAN to enhance speech intelligibility across various material and low signal-to-noise ratio (SNR) conditions, achieving a short-time objective intelligibility score of 0.76 at 0 dB SNR. These findings provide valuable insights into optimized augmentation and deep-learning techniques for enhancing LDV-based speech recordings in practical applications.","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 4","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing speech intelligibility in optical microphone systems through physics-informed data augmentation.\",\"authors\":\"Jia-Wei Chen, Jia-Hui Li, Yi-Hao Jiang, Yi-Chang Wu, Ying-Hui Lai\",\"doi\":\"10.1121/10.0036356\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Laser doppler vibrometers (LDVs) facilitate noncontact speech acquisition; however, they are prone to material-dependent spectral distortions and speckle noise, which degrade intelligibility in noisy environments. This study proposes a data augmentation method that incorporates material-specific and impulse noises to simulate LDV-induced distortions. The proposed approach utilizes a gated convolutional neural network with HiFi-GAN to enhance speech intelligibility across various material and low signal-to-noise ratio (SNR) conditions, achieving a short-time objective intelligibility score of 0.76 at 0 dB SNR. These findings provide valuable insights into optimized augmentation and deep-learning techniques for enhancing LDV-based speech recordings in practical applications.\",\"PeriodicalId\":73538,\"journal\":{\"name\":\"JASA express letters\",\"volume\":\"5 4\",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JASA express letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1121/10.0036356\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JASA express letters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1121/10.0036356","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

摘要

激光多普勒测振仪（ldv）促进非接触语音采集；然而，它们容易产生与材料相关的光谱失真和散斑噪声，从而降低噪声环境中的可理解性。本研究提出了一种数据增强方法，该方法结合了材料特异性和脉冲噪声来模拟ldv引起的扭曲。该方法利用带有HiFi-GAN的门控卷积神经网络来提高各种材料和低信噪比（SNR）条件下的语音可理解性，在0 dB SNR下实现了0.76的短期客观可理解性评分。这些发现为优化增强和深度学习技术在实际应用中增强基于ldv的语音记录提供了有价值的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing speech intelligibility in optical microphone systems through physics-informed data augmentation.

Laser doppler vibrometers (LDVs) facilitate noncontact speech acquisition; however, they are prone to material-dependent spectral distortions and speckle noise, which degrade intelligibility in noisy environments. This study proposes a data augmentation method that incorporates material-specific and impulse noises to simulate LDV-induced distortions. The proposed approach utilizes a gated convolutional neural network with HiFi-GAN to enhance speech intelligibility across various material and low signal-to-noise ratio (SNR) conditions, achieving a short-time objective intelligibility score of 0.76 at 0 dB SNR. These findings provide valuable insights into optimized augmentation and deep-learning techniques for enhancing LDV-based speech recordings in practical applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

JASA express letters

CiteScore

1.70

自引率

0.00%

发文量