{"title":"Logistic Similarity Metric Learning via Affinity Matrix for Text-Independent Speaker Verification","authors":"Junyi Peng, Rongzhi Gu, Yuexian Zou","doi":"10.1109/ASRU46091.2019.9003995","DOIUrl":null,"url":null,"abstract":"This paper proposes a novel objective function, called Logistic Affinity Loss (Logistic-AL), to optimize the end-to-end speaker verification model. Specifically, firstly, the cosine similarities of all pairs in a mini-batch of speaker embeddings are passed through a learnable logistic regression layer and the probability estimation of all pairs is obtained. Then, the supervision information for each pair is formed by their corresponding one-hot speaker labels, which indicates whether the pair belongs to the same speaker. Finally, the model is optimized by the binary cross entropy between predicted probability and target. In contrast to the other distance metric learning methods that push the distance of similar/dissimilar pairs to a pre-defined target, Logistic-AL builds a learnable decision boundary to distinguish the similar pairs and dissimilar pairs. Experimental results on the VoxCeleb1 dataset show that the x-vector feature extractor optimized by Logistic-AL achieves state-of-the-art performance.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003995","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper proposes a novel objective function, called Logistic Affinity Loss (Logistic-AL), to optimize the end-to-end speaker verification model. Specifically, firstly, the cosine similarities of all pairs in a mini-batch of speaker embeddings are passed through a learnable logistic regression layer and the probability estimation of all pairs is obtained. Then, the supervision information for each pair is formed by their corresponding one-hot speaker labels, which indicates whether the pair belongs to the same speaker. Finally, the model is optimized by the binary cross entropy between predicted probability and target. In contrast to the other distance metric learning methods that push the distance of similar/dissimilar pairs to a pre-defined target, Logistic-AL builds a learnable decision boundary to distinguish the similar pairs and dissimilar pairs. Experimental results on the VoxCeleb1 dataset show that the x-vector feature extractor optimized by Logistic-AL achieves state-of-the-art performance.