TDMF: Task-Driven Multilevel Framework for End-to-End Speaker Verification

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2020-05-01 DOI:10.1109/ICASSP40776.2020.9052957

Chen Chen, Jiqing Han

引用次数: 1

Abstract

In this paper, a task-driven multilevel framework (TDMF) is proposed for end-to-end speaker verification. The TDMF has four layers, and each layer has different effects on speaker models or representations to implement the functions of universal background model (UBM), Gaussian mixture model (GMM), total variability model (TVM) and probabilistic linear discriminant analysis (PLDA). Unlike the typical i-vector method, the proposed TDMF can supervise the optimal solution of each phase (layer) towards the direction required by the PLDA classifier. Moreover, different from most endto-end neural network approaches, which extract embeddings first and then additionally calculate the distance between two embeddings as the verification score, the TDMF can directly provide scores via the fourth-layer PLDA. The experimental results show that the TDMF can achieve better performance than that of the typical i-vector framework and VGG-M convolutional neural networks (CNN) framework.

查看原文本刊更多论文

TDMF:端到端说话者验证的任务驱动多级框架

本文提出了一种任务驱动多层框架(TDMF)，用于端到端说话人验证。TDMF有四层，每层对说话人模型或表示有不同的影响，实现通用背景模型(UBM)、高斯混合模型(GMM)、总变异模型(TVM)和概率线性判别分析(PLDA)的功能。与典型的i向量方法不同，所提出的TDMF可以监督每个相位(层)的最优解朝着PLDA分类器所需的方向。此外，与大多数端到端神经网络方法先提取嵌入，然后再计算两个嵌入之间的距离作为验证分数不同，TDMF可以通过第四层PLDA直接提供分数。实验结果表明，TDMF比典型的i向量框架和VGG-M卷积神经网络(CNN)框架具有更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量