Evaluating Deep Neural Network-based Speaker Verification Systems on Sinhala and Tamil Datasets

S. P. D. Anuraj, S.T. Jarashanth, K. Ahilan, R. Valluvan, Tharmarajah Thiruvaran, A. Kaneswaran
{"title":"Evaluating Deep Neural Network-based Speaker Verification Systems on Sinhala and Tamil Datasets","authors":"S. P. D. Anuraj, S.T. Jarashanth, K. Ahilan, R. Valluvan, Tharmarajah Thiruvaran, A. Kaneswaran","doi":"10.1109/SLAAI-ICAI56923.2022.10002663","DOIUrl":null,"url":null,"abstract":"Speaker verification, a biometric identifier, determines whether an input speech belongs to the claimed identity. The existing models for speaker verification have reported performances mainly in English, and no study has experimented with Sinhala and Tamil datasets. This study proposes a semi-automated pipeline to curate datasets for Sinhala and Tamil from videos on YouTube filmed under noisy and unconstrained conditions which represent real-world scenarios. Both Sinhala and Tamil datasets include utterances for 140 persons of interest (POIs) with more than 300 utterances per POI under one or more genres: interviews, speeches, and vlogs. Moreover, this study investigates how domain mismatch affects a speaker verification model trained in English and applied to Sinhala and Tamil. Two deep neural network models trained in English show significant performance drops on Sinhala and Tamil datasets compared to an English dataset as expected due to domain mismatch, however, it is observed that AM-softmax performed better than vanilla softmax. In the future, robust speaker verification models with domain adaptation techniques will be built to improve performance on Sinhala and Tamil datasets.","PeriodicalId":308901,"journal":{"name":"2022 6th SLAAI International Conference on Artificial Intelligence (SLAAI-ICAI)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th SLAAI International Conference on Artificial Intelligence (SLAAI-ICAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLAAI-ICAI56923.2022.10002663","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Speaker verification, a biometric identifier, determines whether an input speech belongs to the claimed identity. The existing models for speaker verification have reported performances mainly in English, and no study has experimented with Sinhala and Tamil datasets. This study proposes a semi-automated pipeline to curate datasets for Sinhala and Tamil from videos on YouTube filmed under noisy and unconstrained conditions which represent real-world scenarios. Both Sinhala and Tamil datasets include utterances for 140 persons of interest (POIs) with more than 300 utterances per POI under one or more genres: interviews, speeches, and vlogs. Moreover, this study investigates how domain mismatch affects a speaker verification model trained in English and applied to Sinhala and Tamil. Two deep neural network models trained in English show significant performance drops on Sinhala and Tamil datasets compared to an English dataset as expected due to domain mismatch, however, it is observed that AM-softmax performed better than vanilla softmax. In the future, robust speaker verification models with domain adaptation techniques will be built to improve performance on Sinhala and Tamil datasets.
在僧伽罗语和泰米尔语数据集上评估基于深度神经网络的说话人验证系统
说话人验证是一种生物识别标识符,用于确定输入的语音是否属于所要求的身份。现有的说话人验证模型主要报告了英语的表现,没有研究对僧伽罗语和泰米尔语数据集进行实验。这项研究提出了一个半自动的管道,从YouTube上在嘈杂和不受约束的条件下拍摄的视频中为僧伽罗语和泰米尔语策划数据集,这些视频代表了现实世界的场景。僧伽罗语和泰米尔语数据集包括140个感兴趣的人(POI)的话语,每个POI超过300个话语,包括一个或多个类型:访谈,演讲和视频日志。此外,本研究探讨了域不匹配如何影响以英语训练并应用于僧伽罗语和泰米尔语的说话人验证模型。由于域不匹配,两个用英语训练的深度神经网络模型在僧伽罗语和泰米尔语数据集上表现出与英语数据集相比显著的性能下降,然而,观察到AM-softmax比香草softmax表现得更好。未来,将建立具有领域自适应技术的稳健说话人验证模型,以提高在僧伽罗语和泰米尔语数据集上的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信