Overlapped Speech Detection for Improved Speaker Diarization on Tamil Dataset

S.T. Jarashanth, K. Ahilan, R. Valluvan, T. Thiruvaran, A. Kaneswaran
{"title":"Overlapped Speech Detection for Improved Speaker Diarization on Tamil Dataset","authors":"S.T. Jarashanth, K. Ahilan, R. Valluvan, T. Thiruvaran, A. Kaneswaran","doi":"10.1109/SLAAI-ICAI56923.2022.10002438","DOIUrl":null,"url":null,"abstract":"Speaker diarization is the task of partitioning a speech signal into homogeneous segments corresponding to speaker identities. We introduce a Tamil test dataset, considering that the existing literature on speaker diarization has experimented with English to a great extent; however, none on a Tamil dataset. An overlapped speech segment is a part of an audio clip where two or more speakers speak simultaneously. Overlapped speech regions degrade the performance of a speaker diarization system proportionally due to the complexity of identifying individual speakers. This study proposes an overlapped speech detection (OSD) model by discarding the non-speech segments and feeding speech segments into a Convolutional Recurrent Neural Network model as a binary classifier: single speaker speech and overlapped speech. The OSD model is integrated into a speaker diarizer, and the performance gain on the standard VoxConverse and our Tamil datasets in terms of Diarization Error Rate are 5.6% and 13.4%, respectively.","PeriodicalId":308901,"journal":{"name":"2022 6th SLAAI International Conference on Artificial Intelligence (SLAAI-ICAI)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th SLAAI International Conference on Artificial Intelligence (SLAAI-ICAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLAAI-ICAI56923.2022.10002438","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Speaker diarization is the task of partitioning a speech signal into homogeneous segments corresponding to speaker identities. We introduce a Tamil test dataset, considering that the existing literature on speaker diarization has experimented with English to a great extent; however, none on a Tamil dataset. An overlapped speech segment is a part of an audio clip where two or more speakers speak simultaneously. Overlapped speech regions degrade the performance of a speaker diarization system proportionally due to the complexity of identifying individual speakers. This study proposes an overlapped speech detection (OSD) model by discarding the non-speech segments and feeding speech segments into a Convolutional Recurrent Neural Network model as a binary classifier: single speaker speech and overlapped speech. The OSD model is integrated into a speaker diarizer, and the performance gain on the standard VoxConverse and our Tamil datasets in terms of Diarization Error Rate are 5.6% and 13.4%, respectively.
基于泰米尔语数据集的改进说话人分类的重叠语音检测
说话人划分是将语音信号划分为与说话人身份相对应的均匀段的任务。我们引入了一个泰米尔语测试数据集,考虑到现有的关于说话人分化的文献已经在很大程度上用英语进行了实验;然而,没有泰米尔数据集。重叠的语音片段是两个或多个说话者同时说话的音频片段的一部分。由于识别单个说话人的复杂性,重叠的语音区域会成比例地降低说话人分化系统的性能。本研究提出了一种重叠语音检测(OSD)模型,该模型通过丢弃非语音片段并将语音片段馈送到卷积递归神经网络模型中作为二元分类器:单说话人语音和重叠语音。将OSD模型集成到说话记录器中,在标准VoxConverse和我们的Tamil数据集上,记录错误率的性能分别提高了5.6%和13.4%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信