基于深度神经网络与x向量联合训练的特征增强对噪声鲁棒说话人验证的研究

2020 International Conference on Electronics, Information, and Communication (ICEIC) Pub Date : 2020-01-01 DOI:10.1109/ICEIC49074.2020.9051093

Joon-Young Yang, Kwan-Ho Park, Joon‐Hyuk Chang, Youngsam Kim, Sangrae Cho

{"title":"基于深度神经网络与x向量联合训练的特征增强对噪声鲁棒说话人验证的研究","authors":"Joon-Young Yang, Kwan-Ho Park, Joon‐Hyuk Chang, Youngsam Kim, Sangrae Cho","doi":"10.1109/ICEIC49074.2020.9051093","DOIUrl":null,"url":null,"abstract":"In this paper, we investigate the deep neural network (DNN) based feature enhancement as the denoising frontend of the x-vector speaker verification framework in noisy environments. Firstly, the feature enhancement DNN (FE-DNN) learns the mapping function from the noisy to the clean corpora on the frame-level acoustic feature domain, and then the x-vector network (XvectorNet) is trained on top of the enhanced features. Finally, the separately trained FE-DNN and the XvectorNet are serially concatenated and jointly trained under the supervision of cross-entropy loss. In addition., we adopt the logistic margin softmax layer for training the XvectorNet in order to obtain more discriminative speaker embeddings.","PeriodicalId":271345,"journal":{"name":"2020 International Conference on Electronics, Information, and Communication (ICEIC)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Investigation of DNN based Feature Enhancement Jointly Trained with X-Vectors for Noise-Robust Speaker Verification\",\"authors\":\"Joon-Young Yang, Kwan-Ho Park, Joon‐Hyuk Chang, Youngsam Kim, Sangrae Cho\",\"doi\":\"10.1109/ICEIC49074.2020.9051093\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we investigate the deep neural network (DNN) based feature enhancement as the denoising frontend of the x-vector speaker verification framework in noisy environments. Firstly, the feature enhancement DNN (FE-DNN) learns the mapping function from the noisy to the clean corpora on the frame-level acoustic feature domain, and then the x-vector network (XvectorNet) is trained on top of the enhanced features. Finally, the separately trained FE-DNN and the XvectorNet are serially concatenated and jointly trained under the supervision of cross-entropy loss. In addition., we adopt the logistic margin softmax layer for training the XvectorNet in order to obtain more discriminative speaker embeddings.\",\"PeriodicalId\":271345,\"journal\":{\"name\":\"2020 International Conference on Electronics, Information, and Communication (ICEIC)\",\"volume\":\"134 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Electronics, Information, and Communication (ICEIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICEIC49074.2020.9051093\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Electronics, Information, and Communication (ICEIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIC49074.2020.9051093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在本文中，我们研究了基于深度神经网络(DNN)的特征增强作为噪声环境下x向量说话人验证框架的去噪前端。首先，特征增强深度神经网络(FE-DNN)在帧级声学特征域中学习从噪声到干净语料库的映射函数，然后在增强特征的基础上训练x向量网络(XvectorNet)。最后，在交叉熵损失的监督下，将单独训练的FE-DNN和XvectorNet进行串联和联合训练。此外。采用logistic margin softmax层对XvectorNet进行训练，以获得更具判别性的说话人嵌入。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Investigation of DNN based Feature Enhancement Jointly Trained with X-Vectors for Noise-Robust Speaker Verification

In this paper, we investigate the deep neural network (DNN) based feature enhancement as the denoising frontend of the x-vector speaker verification framework in noisy environments. Firstly, the feature enhancement DNN (FE-DNN) learns the mapping function from the noisy to the clean corpora on the frame-level acoustic feature domain, and then the x-vector network (XvectorNet) is trained on top of the enhanced features. Finally, the separately trained FE-DNN and the XvectorNet are serially concatenated and jointly trained under the supervision of cross-entropy loss. In addition., we adopt the logistic margin softmax layer for training the XvectorNet in order to obtain more discriminative speaker embeddings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 International Conference on Electronics, Information, and Communication (ICEIC)

自引率

0.00%

发文量