End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection

2021 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2021-01-19 DOI:10.1109/SLT48900.2021.9383555

Yuki Takashima, Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Leibny Paola García-Perera, Kenji Nagamatsu

引用次数: 19

Abstract

In this paper, we present a conditional multitask learning method for end-to-end neural speaker diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlapping speech. In this paper, to further improve the performance of the EEND system, we propose a novel multitask learning framework that solves speaker diarization and a desired subtask while explicitly considering the task dependency. We optimize speaker diarization conditioned on speech activity and overlap detection that are subtasks of speaker diarization, based on the probabilistic chain rule. Experimental results show that our proposed method can leverage a subtask to effectively model speaker diarization, and outperforms conventional EEND systems in terms of diarization error rate.

查看原文本刊更多论文

基于语音活动和重叠检测的端到端说话人分割

本文提出了一种端到端神经说话人化(EEND)的条件多任务学习方法。与传统的基于聚类的方法相比，EEND系统表现出了良好的性能，特别是在语音重叠的情况下。在本文中，为了进一步提高EEND系统的性能，我们提出了一种新的多任务学习框架，该框架在明确考虑任务依赖性的同时解决了说话人分化和期望子任务。我们基于概率链式规则优化了以语音活动和重叠检测为条件的说话人特征化，而重叠检测是说话人特征化的子任务。实验结果表明，该方法可以利用子任务有效地模拟说话人拨号，并且在拨号错误率方面优于传统的EEND系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量