Overlapped Speech Detection for Improved Speaker Diarization on Tamil Dataset

2022 6th SLAAI International Conference on Artificial Intelligence (SLAAI-ICAI) Pub Date : 2022-12-01 DOI:10.1109/SLAAI-ICAI56923.2022.10002438

S.T. Jarashanth, K. Ahilan, R. Valluvan, T. Thiruvaran, A. Kaneswaran

{"title":"Overlapped Speech Detection for Improved Speaker Diarization on Tamil Dataset","authors":"S.T. Jarashanth, K. Ahilan, R. Valluvan, T. Thiruvaran, A. Kaneswaran","doi":"10.1109/SLAAI-ICAI56923.2022.10002438","DOIUrl":null,"url":null,"abstract":"Speaker diarization is the task of partitioning a speech signal into homogeneous segments corresponding to speaker identities. We introduce a Tamil test dataset, considering that the existing literature on speaker diarization has experimented with English to a great extent; however, none on a Tamil dataset. An overlapped speech segment is a part of an audio clip where two or more speakers speak simultaneously. Overlapped speech regions degrade the performance of a speaker diarization system proportionally due to the complexity of identifying individual speakers. This study proposes an overlapped speech detection (OSD) model by discarding the non-speech segments and feeding speech segments into a Convolutional Recurrent Neural Network model as a binary classifier: single speaker speech and overlapped speech. The OSD model is integrated into a speaker diarizer, and the performance gain on the standard VoxConverse and our Tamil datasets in terms of Diarization Error Rate are 5.6% and 13.4%, respectively.","PeriodicalId":308901,"journal":{"name":"2022 6th SLAAI International Conference on Artificial Intelligence (SLAAI-ICAI)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th SLAAI International Conference on Artificial Intelligence (SLAAI-ICAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLAAI-ICAI56923.2022.10002438","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Speaker diarization is the task of partitioning a speech signal into homogeneous segments corresponding to speaker identities. We introduce a Tamil test dataset, considering that the existing literature on speaker diarization has experimented with English to a great extent; however, none on a Tamil dataset. An overlapped speech segment is a part of an audio clip where two or more speakers speak simultaneously. Overlapped speech regions degrade the performance of a speaker diarization system proportionally due to the complexity of identifying individual speakers. This study proposes an overlapped speech detection (OSD) model by discarding the non-speech segments and feeding speech segments into a Convolutional Recurrent Neural Network model as a binary classifier: single speaker speech and overlapped speech. The OSD model is integrated into a speaker diarizer, and the performance gain on the standard VoxConverse and our Tamil datasets in terms of Diarization Error Rate are 5.6% and 13.4%, respectively.

查看原文本刊更多论文

基于泰米尔语数据集的改进说话人分类的重叠语音检测

说话人划分是将语音信号划分为与说话人身份相对应的均匀段的任务。我们引入了一个泰米尔语测试数据集，考虑到现有的关于说话人分化的文献已经在很大程度上用英语进行了实验;然而，没有泰米尔数据集。重叠的语音片段是两个或多个说话者同时说话的音频片段的一部分。由于识别单个说话人的复杂性，重叠的语音区域会成比例地降低说话人分化系统的性能。本研究提出了一种重叠语音检测(OSD)模型，该模型通过丢弃非语音片段并将语音片段馈送到卷积递归神经网络模型中作为二元分类器:单说话人语音和重叠语音。将OSD模型集成到说话记录器中，在标准VoxConverse和我们的Tamil数据集上，记录错误率的性能分别提高了5.6%和13.4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 6th SLAAI International Conference on Artificial Intelligence (SLAAI-ICAI)

自引率

0.00%

发文量