Age-Invariant Speaker Embedding for Diarization of Cognitive Assessments

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI:10.1109/ISCSLP49672.2021.9362084

Sean Shensheng Xu, M. Mak, Ka Ho WONG, H. Meng, T. Kwok

引用次数: 3

Abstract

This paper investigates an age-invariant speaker embedding approach to speaker diarization, which is an essential step towards the automatic cognitive assessments from speech. Studies have shown that incorporating speaker traits (e.g., age, gender, etc.) can improve speaker diarization performance. However, we found that age information in the speaker embeddings is detrimental to speaker diarization if there is a severe mismatch between the age distributions in the training data and test data. To minimize the detrimental effect of age mismatch, an adversarial training strategy is introduced to remove age variability from the utterance-level speaker embeddings. Evaluations on an interactive dialog dataset for Montreal cognitive assessments (MoCA) show that the adversarial training strategy can produce age-invariant embeddings and reduce diarization error rate (DER) by 4.33%. The approach also outperforms the conventional method even with less training data.

查看原文本刊更多论文

基于年龄不变说话人嵌入的认知评估

本文研究了一种年龄不变的说话人嵌入方法来实现说话人化，这是实现语音自动认知评估的重要步骤。研究表明，结合说话人的特征(如年龄、性别等)可以提高说话人的特征化表现。然而，我们发现，当训练数据和测试数据中的年龄分布严重不匹配时，说话人嵌入中的年龄信息对说话人的分类是不利的。为了最大限度地减少年龄不匹配的不利影响，引入了一种对抗性训练策略，从话语级说话者嵌入中去除年龄变化。在蒙特利尔认知评估(MoCA)的交互式对话数据集上的评估表明，对抗训练策略可以产生年龄不变的嵌入，并将分类错误率(DER)降低4.33%。即使在训练数据较少的情况下，该方法也优于传统方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)

自引率

0.00%

发文量