Investigation of DNN based Feature Enhancement Jointly Trained with X-Vectors for Noise-Robust Speaker Verification

2020 International Conference on Electronics, Information, and Communication (ICEIC) Pub Date : 2020-01-01 DOI:10.1109/ICEIC49074.2020.9051093

Joon-Young Yang, Kwan-Ho Park, Joon‐Hyuk Chang, Youngsam Kim, Sangrae Cho

引用次数: 1

Abstract

In this paper, we investigate the deep neural network (DNN) based feature enhancement as the denoising frontend of the x-vector speaker verification framework in noisy environments. Firstly, the feature enhancement DNN (FE-DNN) learns the mapping function from the noisy to the clean corpora on the frame-level acoustic feature domain, and then the x-vector network (XvectorNet) is trained on top of the enhanced features. Finally, the separately trained FE-DNN and the XvectorNet are serially concatenated and jointly trained under the supervision of cross-entropy loss. In addition., we adopt the logistic margin softmax layer for training the XvectorNet in order to obtain more discriminative speaker embeddings.

查看原文本刊更多论文

基于深度神经网络与x向量联合训练的特征增强对噪声鲁棒说话人验证的研究

在本文中，我们研究了基于深度神经网络(DNN)的特征增强作为噪声环境下x向量说话人验证框架的去噪前端。首先，特征增强深度神经网络(FE-DNN)在帧级声学特征域中学习从噪声到干净语料库的映射函数，然后在增强特征的基础上训练x向量网络(XvectorNet)。最后，在交叉熵损失的监督下，将单独训练的FE-DNN和XvectorNet进行串联和联合训练。此外。采用logistic margin softmax层对XvectorNet进行训练，以获得更具判别性的说话人嵌入。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 International Conference on Electronics, Information, and Communication (ICEIC)

自引率

0.00%

发文量