Multi-Noise Representation Learning for Robust Speaker Recognition

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters Pub Date : 2025-01-16 DOI:10.1109/LSP.2025.3530879

Sunyoung Cho;Kyungchul Wee

引用次数: 0

Abstract

Speaker recognition in noisy environments remains a challenging issue due to highly variable noise, which hinders convergence to an optimal solution. To address the information discrepancies caused by noise variability during the training process, we explore a multi-modal learning scheme by treating different types of noise as distinct modalities. We propose a multi-noise representation learning method to extract embeddings that encode discriminative characteristics for each noise type, along with integrated commonalities from various types of noise. Specifically, the multi-noise learning network is jointly trained with an embedding extractor to continuously incorporate refined features under noisy conditions into the speaker embeddings. Experiments on VoxCeleb1 demonstrate that the proposed method is effective when used in conjunction with embedding extractors, outperforming state-of-the-art methods in noisy conditions.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Signal Processing Letters 工程技术-工程：电子与电气

CiteScore

7.40

自引率

12.80%

发文量

339

审稿时长

2.8 months

期刊介绍： The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.