基于时频域建模的参数阵列扬声器语音恢复深度预处理方法

IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Wenyao Ma;Yunxi Zhu;Jun Yang
{"title":"基于时频域建模的参数阵列扬声器语音恢复深度预处理方法","authors":"Wenyao Ma;Yunxi Zhu;Jun Yang","doi":"10.1109/LSP.2025.3609247","DOIUrl":null,"url":null,"abstract":"The parametric array loudspeaker inherently introduces baseband distortions in directional sound applications due to the nonlinear process in air. Recently, DNNs have been used to model this forward process and to generate preprocessed signals for distortion-free speech restoration. However, when trained on real-world audio, the preprocessing network can exploit weaknesses in the forward model, producing adversarial outputs. To address it, we propose a reorganization strategy for the two-stage framework, comprising a causal TF-GridNet for preprocessed signal generation and a modified time-frequency (T-F) domain differential Volterra Filter (DiffVF) as the forward model. The causal TF-GridNet estimates real and imaginary components using a T-F band-split mechanism. The modified forward model integrates the second-order difference and kernel convolution operations of the original time-domain version into the T-F domain, preserving interpretability while stabilizing training. A refined <inline-formula><tex-math>$N$</tex-math></inline-formula>th-order equalization, based on the T-F domain DiffVF model, is implemented as a competitive baseline. Simulated and real-world experiments demonstrate state-of-the-art reconstruction performance of the proposed method across various objective metrics.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3720-3724"},"PeriodicalIF":3.9000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Preprocessing Method for Speech Restoration in Parametric Array Loudspeakers via Time-Frequency Domain Modeling\",\"authors\":\"Wenyao Ma;Yunxi Zhu;Jun Yang\",\"doi\":\"10.1109/LSP.2025.3609247\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The parametric array loudspeaker inherently introduces baseband distortions in directional sound applications due to the nonlinear process in air. Recently, DNNs have been used to model this forward process and to generate preprocessed signals for distortion-free speech restoration. However, when trained on real-world audio, the preprocessing network can exploit weaknesses in the forward model, producing adversarial outputs. To address it, we propose a reorganization strategy for the two-stage framework, comprising a causal TF-GridNet for preprocessed signal generation and a modified time-frequency (T-F) domain differential Volterra Filter (DiffVF) as the forward model. The causal TF-GridNet estimates real and imaginary components using a T-F band-split mechanism. The modified forward model integrates the second-order difference and kernel convolution operations of the original time-domain version into the T-F domain, preserving interpretability while stabilizing training. A refined <inline-formula><tex-math>$N$</tex-math></inline-formula>th-order equalization, based on the T-F domain DiffVF model, is implemented as a competitive baseline. Simulated and real-world experiments demonstrate state-of-the-art reconstruction performance of the proposed method across various objective metrics.\",\"PeriodicalId\":13154,\"journal\":{\"name\":\"IEEE Signal Processing Letters\",\"volume\":\"32 \",\"pages\":\"3720-3724\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Signal Processing Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11159167/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11159167/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

由于空气中的非线性过程,参数阵列扬声器在定向声应用中固有地引入了基带畸变。最近,深度神经网络被用来模拟这一前向过程,并产生预处理信号,用于无失真语音恢复。然而,当在真实世界的音频上训练时,预处理网络可以利用前向模型的弱点,产生对抗性输出。为了解决这个问题,我们提出了一种两阶段框架的重组策略,包括用于预处理信号生成的因果TF-GridNet和改进的时频(T-F)域差分Volterra滤波器(DiffVF)作为前向模型。因果TF-GridNet使用T-F频段分裂机制估计实分量和虚分量。改进后的正演模型将原时域版本的二阶差分和核卷积运算整合到T-F域,在保持训练稳定性的同时保持可解释性。基于T-F域DiffVF模型的改进的$N$阶均衡被实现为竞争基线。模拟和现实世界的实验证明了所提出的方法在各种客观指标上的最先进的重建性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Deep Preprocessing Method for Speech Restoration in Parametric Array Loudspeakers via Time-Frequency Domain Modeling
The parametric array loudspeaker inherently introduces baseband distortions in directional sound applications due to the nonlinear process in air. Recently, DNNs have been used to model this forward process and to generate preprocessed signals for distortion-free speech restoration. However, when trained on real-world audio, the preprocessing network can exploit weaknesses in the forward model, producing adversarial outputs. To address it, we propose a reorganization strategy for the two-stage framework, comprising a causal TF-GridNet for preprocessed signal generation and a modified time-frequency (T-F) domain differential Volterra Filter (DiffVF) as the forward model. The causal TF-GridNet estimates real and imaginary components using a T-F band-split mechanism. The modified forward model integrates the second-order difference and kernel convolution operations of the original time-domain version into the T-F domain, preserving interpretability while stabilizing training. A refined $N$th-order equalization, based on the T-F domain DiffVF model, is implemented as a competitive baseline. Simulated and real-world experiments demonstrate state-of-the-art reconstruction performance of the proposed method across various objective metrics.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Signal Processing Letters
IEEE Signal Processing Letters 工程技术-工程:电子与电气
CiteScore
7.40
自引率
12.80%
发文量
339
审稿时长
2.8 months
期刊介绍: The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信