A dual-branch deep interaction network for multi-channel speech enhancement

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-05-15 DOI:10.1016/j.neucom.2025.130412

Xiaoyu Lian, Nan Xia, Gaole Dai, Hongqin Yang

引用次数: 0

Abstract

Multi-channel speech enhancement removes noise and reverberation interference from noisy speech signals captured by microphone arrays. In this paper, we propose a dual-branch deep interaction network (DBDINet) for multi-channel speech enhancement, which complements the important features of both time domain and time–frequency domain in the speech signal. We design a waveform and complex spectrum interaction module (WCIM) to interact deeply with the information of two domains and propose an efficient Conformer (eConformer) as a transition layer of the network to improve network efficiency. We conducted extensive experiments on the synthetic AISHELL-1 dataset and the CHiME-3 dataset. The experimental results show that the proposed method achieves competitive performance on several metrics while maintaining lower computational complexity with faster inference speed than existing advanced methods.

查看原文本刊更多论文

一种用于多通道语音增强的双分支深度交互网络

多通道语音增强消除噪声和混响干扰从噪声语音信号捕获的麦克风阵列。本文提出了一种用于多通道语音增强的双分支深度交互网络（DBDINet），它补充了语音信号中时域和时频域的重要特征。我们设计了一个波形和复杂频谱交互模块（WCIM）来与两个域的信息进行深度交互，并提出了一个高效的共形器（eConformer）作为网络的过渡层来提高网络效率。我们在合成的AISHELL-1数据集和CHiME-3数据集上进行了大量的实验。实验结果表明，与现有的先进方法相比，该方法在保持较低的计算复杂度和较快的推理速度的同时，在多个指标上取得了具有竞争力的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.