{"title":"Multi-Noise Representation Learning for Robust Speaker Recognition","authors":"Sunyoung Cho;Kyungchul Wee","doi":"10.1109/LSP.2025.3530879","DOIUrl":null,"url":null,"abstract":"Speaker recognition in noisy environments remains a challenging issue due to highly variable noise, which hinders convergence to an optimal solution. To address the information discrepancies caused by noise variability during the training process, we explore a multi-modal learning scheme by treating different types of noise as distinct modalities. We propose a multi-noise representation learning method to extract embeddings that encode discriminative characteristics for each noise type, along with integrated commonalities from various types of noise. Specifically, the multi-noise learning network is jointly trained with an embedding extractor to continuously incorporate refined features under noisy conditions into the speaker embeddings. Experiments on VoxCeleb1 demonstrate that the proposed method is effective when used in conjunction with embedding extractors, outperforming state-of-the-art methods in noisy conditions.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"681-685"},"PeriodicalIF":3.2000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10843837/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Speaker recognition in noisy environments remains a challenging issue due to highly variable noise, which hinders convergence to an optimal solution. To address the information discrepancies caused by noise variability during the training process, we explore a multi-modal learning scheme by treating different types of noise as distinct modalities. We propose a multi-noise representation learning method to extract embeddings that encode discriminative characteristics for each noise type, along with integrated commonalities from various types of noise. Specifically, the multi-noise learning network is jointly trained with an embedding extractor to continuously incorporate refined features under noisy conditions into the speaker embeddings. Experiments on VoxCeleb1 demonstrate that the proposed method is effective when used in conjunction with embedding extractors, outperforming state-of-the-art methods in noisy conditions.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.