Hierarchically Attending Time-Frequency and Channel Features for Improving Speaker Verification

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI:10.1109/ISCSLP49672.2021.9362054

Chenglong Wang, Jiangyan Yi, J. Tao, Ye Bai, Zhengkun Tian

引用次数: 5

Abstract

Attention-based models have recently shown powerful representation learning ability in speaker recognition. However, most of the attention mechanism based models primarily focus on pooling layers. In this work, we present an end-to-end speaker verification system which leverage time-frequency and channel features hierarchically. To further improve system performance, we employ Large Margin Cosine Loss to optimize the model to determine the optimal loss function. We carry out experiments on the VoxCeleb1 datasets to evaluate the effectiveness of our methods. The results suggest that our best system outperforms the i-vector + PLDA and x-vector system by 53.3% and 7.6%, respectively.

查看原文本刊更多论文

分层关注时频和信道特征以改进说话人验证

近年来，基于注意的模型在说话人识别中显示出强大的表征学习能力。然而，大多数基于注意力机制的模型主要关注池化层。在这项工作中，我们提出了一个端到端的说话人验证系统，该系统分层利用时频和信道特征。为了进一步提高系统性能，我们采用大余弦损失对模型进行优化，以确定最优损失函数。我们在VoxCeleb1数据集上进行了实验，以评估我们的方法的有效性。结果表明，我们的最佳系统比i-vector + PLDA和x-vector系统分别高出53.3%和7.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)

自引率

0.00%

发文量