Logistic Similarity Metric Learning via Affinity Matrix for Text-Independent Speaker Verification

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI:10.1109/ASRU46091.2019.9003995

Junyi Peng, Rongzhi Gu, Yuexian Zou

引用次数: 1

Abstract

This paper proposes a novel objective function, called Logistic Affinity Loss (Logistic-AL), to optimize the end-to-end speaker verification model. Specifically, firstly, the cosine similarities of all pairs in a mini-batch of speaker embeddings are passed through a learnable logistic regression layer and the probability estimation of all pairs is obtained. Then, the supervision information for each pair is formed by their corresponding one-hot speaker labels, which indicates whether the pair belongs to the same speaker. Finally, the model is optimized by the binary cross entropy between predicted probability and target. In contrast to the other distance metric learning methods that push the distance of similar/dissimilar pairs to a pre-defined target, Logistic-AL builds a learnable decision boundary to distinguish the similar pairs and dissimilar pairs. Experimental results on the VoxCeleb1 dataset show that the x-vector feature extractor optimized by Logistic-AL achieves state-of-the-art performance.

查看原文本刊更多论文

基于亲和矩阵的逻辑相似度度量学习用于文本无关说话人验证

本文提出了一种新的目标函数Logistic亲和损失(Logistic- al)来优化端到端说话人验证模型。具体而言，首先通过可学习逻辑回归层传递小批量说话人嵌入中所有对的余弦相似度，得到所有对的概率估计;然后，对每一对的监督信息由其对应的单热扬声器标签组成，该标签表示对是否属于同一扬声器。最后，利用预测概率与目标间的二值交叉熵对模型进行优化。与其他距离度量学习方法将相似/不相似对的距离推到预定目标不同，Logistic-AL建立了一个可学习的决策边界来区分相似对和不相似对。在VoxCeleb1数据集上的实验结果表明，Logistic-AL优化的x向量特征提取器达到了最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量