基于时频域建模的参数阵列扬声器语音恢复深度预处理方法

IF 3.9 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters Pub Date : 2025-09-11 DOI:10.1109/LSP.2025.3609247

Wenyao Ma;Yunxi Zhu;Jun Yang

{"title":"基于时频域建模的参数阵列扬声器语音恢复深度预处理方法","authors":"Wenyao Ma;Yunxi Zhu;Jun Yang","doi":"10.1109/LSP.2025.3609247","DOIUrl":null,"url":null,"abstract":"The parametric array loudspeaker inherently introduces baseband distortions in directional sound applications due to the nonlinear process in air. Recently, DNNs have been used to model this forward process and to generate preprocessed signals for distortion-free speech restoration. However, when trained on real-world audio, the preprocessing network can exploit weaknesses in the forward model, producing adversarial outputs. To address it, we propose a reorganization strategy for the two-stage framework, comprising a causal TF-GridNet for preprocessed signal generation and a modified time-frequency (T-F) domain differential Volterra Filter (DiffVF) as the forward model. The causal TF-GridNet estimates real and imaginary components using a T-F band-split mechanism. The modified forward model integrates the second-order difference and kernel convolution operations of the original time-domain version into the T-F domain, preserving interpretability while stabilizing training. A refined <inline-formula><tex-math>$N$</tex-math></inline-formula>th-order equalization, based on the T-F domain DiffVF model, is implemented as a competitive baseline. Simulated and real-world experiments demonstrate state-of-the-art reconstruction performance of the proposed method across various objective metrics.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3720-3724"},"PeriodicalIF":3.9000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Preprocessing Method for Speech Restoration in Parametric Array Loudspeakers via Time-Frequency Domain Modeling\",\"authors\":\"Wenyao Ma;Yunxi Zhu;Jun Yang\",\"doi\":\"10.1109/LSP.2025.3609247\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The parametric array loudspeaker inherently introduces baseband distortions in directional sound applications due to the nonlinear process in air. Recently, DNNs have been used to model this forward process and to generate preprocessed signals for distortion-free speech restoration. However, when trained on real-world audio, the preprocessing network can exploit weaknesses in the forward model, producing adversarial outputs. To address it, we propose a reorganization strategy for the two-stage framework, comprising a causal TF-GridNet for preprocessed signal generation and a modified time-frequency (T-F) domain differential Volterra Filter (DiffVF) as the forward model. The causal TF-GridNet estimates real and imaginary components using a T-F band-split mechanism. The modified forward model integrates the second-order difference and kernel convolution operations of the original time-domain version into the T-F domain, preserving interpretability while stabilizing training. A refined <inline-formula><tex-math>$N$</tex-math></inline-formula>th-order equalization, based on the T-F domain DiffVF model, is implemented as a competitive baseline. Simulated and real-world experiments demonstrate state-of-the-art reconstruction performance of the proposed method across various objective metrics.\",\"PeriodicalId\":13154,\"journal\":{\"name\":\"IEEE Signal Processing Letters\",\"volume\":\"32 \",\"pages\":\"3720-3724\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Signal Processing Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11159167/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11159167/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

由于空气中的非线性过程，参数阵列扬声器在定向声应用中固有地引入了基带畸变。最近，深度神经网络被用来模拟这一前向过程，并产生预处理信号，用于无失真语音恢复。然而，当在真实世界的音频上训练时，预处理网络可以利用前向模型的弱点，产生对抗性输出。为了解决这个问题，我们提出了一种两阶段框架的重组策略，包括用于预处理信号生成的因果TF-GridNet和改进的时频（T-F）域差分Volterra滤波器（DiffVF）作为前向模型。因果TF-GridNet使用T-F频段分裂机制估计实分量和虚分量。改进后的正演模型将原时域版本的二阶差分和核卷积运算整合到T-F域，在保持训练稳定性的同时保持可解释性。基于T-F域DiffVF模型的改进的$N$阶均衡被实现为竞争基线。模拟和现实世界的实验证明了所提出的方法在各种客观指标上的最先进的重建性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep Preprocessing Method for Speech Restoration in Parametric Array Loudspeakers via Time-Frequency Domain Modeling

The parametric array loudspeaker inherently introduces baseband distortions in directional sound applications due to the nonlinear process in air. Recently, DNNs have been used to model this forward process and to generate preprocessed signals for distortion-free speech restoration. However, when trained on real-world audio, the preprocessing network can exploit weaknesses in the forward model, producing adversarial outputs. To address it, we propose a reorganization strategy for the two-stage framework, comprising a causal TF-GridNet for preprocessed signal generation and a modified time-frequency (T-F) domain differential Volterra Filter (DiffVF) as the forward model. The causal TF-GridNet estimates real and imaginary components using a T-F band-split mechanism. The modified forward model integrates the second-order difference and kernel convolution operations of the original time-domain version into the T-F domain, preserving interpretability while stabilizing training. A refined

$N$

th-order equalization, based on the T-F domain DiffVF model, is implemented as a competitive baseline. Simulated and real-world experiments demonstrate state-of-the-art reconstruction performance of the proposed method across various objective metrics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Signal Processing Letters 工程技术-工程：电子与电气

CiteScore

7.40

自引率

12.80%

发文量

339

审稿时长

2.8 months

期刊介绍： The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.