Self-Supervised Voice Denoising Network for Multi-Scenario Human-Robot Interaction.

IF 3.9 3区医学 Q1 ENGINEERING, MULTIDISCIPLINARY

Biomimetics Pub Date : 2025-09-09 DOI:10.3390/biomimetics10090603

Mu Li, Wenjin Xu, Chao Zeng, Ning Wang

引用次数: 0

Abstract

Human-robot interaction (HRI) via voice command has significantly advanced in recent years, with large Vision-Language-Action (VLA) models demonstrating particular promise in human-robot voice interaction. However, these systems still struggle with environmental noise contamination during voice interaction and lack a specialized denoising network for multi-speaker command isolation in an overlapping speech scenario. To overcome these challenges, we introduce a method to enhance voice command-based HRI in noisy environments, leveraging synthetic data and a self-supervised denoising network to enhance its real-world applicability. Our approach focuses on improving self-supervised network performance in denoising mixed-noise audio through training data scaling. Extensive experiments show our method outperforms existing approaches in simulation and achieves 7.5% higher accuracy than the state-of-the-art method in noisy real-world environments, enhancing voice-guided robot control.

Abstract Image

查看原文本刊更多论文

面向多场景人机交互的自监督语音去噪网络。

通过语音命令的人机交互（HRI）近年来取得了显着进展，大型视觉-语言-动作（VLA）模型在人机语音交互中显示出特别的前景。然而，这些系统在语音交互过程中仍然受到环境噪声污染的困扰，并且在重叠语音场景中缺乏专门的多扬声器命令隔离降噪网络。为了克服这些挑战，我们引入了一种在嘈杂环境中增强基于语音命令的HRI的方法，利用合成数据和自监督去噪网络来增强其在现实世界中的适用性。我们的方法侧重于通过训练数据缩放来提高自监督网络在去噪混合噪声音频方面的性能。大量的实验表明，我们的方法在模拟中优于现有的方法，并且在嘈杂的现实环境中比最先进的方法提高了7.5%的精度，增强了语音引导机器人的控制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊