通过物理数据增强增强光学麦克风系统的语音清晰度。

IF 1.4 Q3 ACOUSTICS
Jia-Wei Chen, Jia-Hui Li, Yi-Hao Jiang, Yi-Chang Wu, Ying-Hui Lai
{"title":"通过物理数据增强增强光学麦克风系统的语音清晰度。","authors":"Jia-Wei Chen, Jia-Hui Li, Yi-Hao Jiang, Yi-Chang Wu, Ying-Hui Lai","doi":"10.1121/10.0036356","DOIUrl":null,"url":null,"abstract":"<p><p>Laser doppler vibrometers (LDVs) facilitate noncontact speech acquisition; however, they are prone to material-dependent spectral distortions and speckle noise, which degrade intelligibility in noisy environments. This study proposes a data augmentation method that incorporates material-specific and impulse noises to simulate LDV-induced distortions. The proposed approach utilizes a gated convolutional neural network with HiFi-GAN to enhance speech intelligibility across various material and low signal-to-noise ratio (SNR) conditions, achieving a short-time objective intelligibility score of 0.76 at 0 dB SNR. These findings provide valuable insights into optimized augmentation and deep-learning techniques for enhancing LDV-based speech recordings in practical applications.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 4","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing speech intelligibility in optical microphone systems through physics-informed data augmentation.\",\"authors\":\"Jia-Wei Chen, Jia-Hui Li, Yi-Hao Jiang, Yi-Chang Wu, Ying-Hui Lai\",\"doi\":\"10.1121/10.0036356\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Laser doppler vibrometers (LDVs) facilitate noncontact speech acquisition; however, they are prone to material-dependent spectral distortions and speckle noise, which degrade intelligibility in noisy environments. This study proposes a data augmentation method that incorporates material-specific and impulse noises to simulate LDV-induced distortions. The proposed approach utilizes a gated convolutional neural network with HiFi-GAN to enhance speech intelligibility across various material and low signal-to-noise ratio (SNR) conditions, achieving a short-time objective intelligibility score of 0.76 at 0 dB SNR. These findings provide valuable insights into optimized augmentation and deep-learning techniques for enhancing LDV-based speech recordings in practical applications.</p>\",\"PeriodicalId\":73538,\"journal\":{\"name\":\"JASA express letters\",\"volume\":\"5 4\",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JASA express letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1121/10.0036356\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JASA express letters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1121/10.0036356","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

摘要

激光多普勒测振仪(ldv)促进非接触语音采集;然而,它们容易产生与材料相关的光谱失真和散斑噪声,从而降低噪声环境中的可理解性。本研究提出了一种数据增强方法,该方法结合了材料特异性和脉冲噪声来模拟ldv引起的扭曲。该方法利用带有HiFi-GAN的门控卷积神经网络来提高各种材料和低信噪比(SNR)条件下的语音可理解性,在0 dB SNR下实现了0.76的短期客观可理解性评分。这些发现为优化增强和深度学习技术在实际应用中增强基于ldv的语音记录提供了有价值的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Enhancing speech intelligibility in optical microphone systems through physics-informed data augmentation.

Laser doppler vibrometers (LDVs) facilitate noncontact speech acquisition; however, they are prone to material-dependent spectral distortions and speckle noise, which degrade intelligibility in noisy environments. This study proposes a data augmentation method that incorporates material-specific and impulse noises to simulate LDV-induced distortions. The proposed approach utilizes a gated convolutional neural network with HiFi-GAN to enhance speech intelligibility across various material and low signal-to-noise ratio (SNR) conditions, achieving a short-time objective intelligibility score of 0.76 at 0 dB SNR. These findings provide valuable insights into optimized augmentation and deep-learning techniques for enhancing LDV-based speech recordings in practical applications.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.70
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信