俄语说话者识别的神经网络模型

Andrey Popov, Sergey A. Ivanov
{"title":"俄语说话者识别的神经网络模型","authors":"Andrey Popov, Sergey A. Ivanov","doi":"10.1109/ITQMIS53292.2021.9642756","DOIUrl":null,"url":null,"abstract":"Speaker recognition is a form of biometrics that treats a person's voice as a unique biological characteristic for identification, verification or speakers diarisation. Speech recognition usually used acoustic features of human speech. These acoustic patterns reflect both the anatomy and learned behavioral patterns of the individual. Present work has shown a performance of state of art approaches for speaker recognition on dataset of Russian voices. We have compared a performance of Time Delay Neural Network model based on Emphasized Channel Attention, Propagation, and Aggregation (ECAPA-TDNN) pre-trained on English data, model trained directly on Russian data and pre-trained model additionally trained on Russian data and model with some improvements in TDNN architecture.","PeriodicalId":417880,"journal":{"name":"2021 International Conference on Quality Management, Transport and Information Security, Information Technologies (IT&QM&IS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Neural Network Models for Russian Language Speaker Recognition\",\"authors\":\"Andrey Popov, Sergey A. Ivanov\",\"doi\":\"10.1109/ITQMIS53292.2021.9642756\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speaker recognition is a form of biometrics that treats a person's voice as a unique biological characteristic for identification, verification or speakers diarisation. Speech recognition usually used acoustic features of human speech. These acoustic patterns reflect both the anatomy and learned behavioral patterns of the individual. Present work has shown a performance of state of art approaches for speaker recognition on dataset of Russian voices. We have compared a performance of Time Delay Neural Network model based on Emphasized Channel Attention, Propagation, and Aggregation (ECAPA-TDNN) pre-trained on English data, model trained directly on Russian data and pre-trained model additionally trained on Russian data and model with some improvements in TDNN architecture.\",\"PeriodicalId\":417880,\"journal\":{\"name\":\"2021 International Conference on Quality Management, Transport and Information Security, Information Technologies (IT&QM&IS)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Quality Management, Transport and Information Security, Information Technologies (IT&QM&IS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITQMIS53292.2021.9642756\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Quality Management, Transport and Information Security, Information Technologies (IT&QM&IS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITQMIS53292.2021.9642756","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

说话人识别是一种生物识别技术,它将一个人的声音作为一种独特的生物特征来识别、验证或记录说话人。语音识别通常利用人类语音的声学特征。这些声音模式反映了个体的解剖结构和习得的行为模式。目前的工作已经在俄语语音数据集上展示了最先进的说话人识别方法的性能。我们比较了在英语数据上预训练的基于强调信道注意、传播和聚合的时滞神经网络模型(ECAPA-TDNN)、在俄语数据上直接训练的模型、在俄语数据上额外训练的预训练模型以及在TDNN架构上进行一些改进的模型的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Neural Network Models for Russian Language Speaker Recognition
Speaker recognition is a form of biometrics that treats a person's voice as a unique biological characteristic for identification, verification or speakers diarisation. Speech recognition usually used acoustic features of human speech. These acoustic patterns reflect both the anatomy and learned behavioral patterns of the individual. Present work has shown a performance of state of art approaches for speaker recognition on dataset of Russian voices. We have compared a performance of Time Delay Neural Network model based on Emphasized Channel Attention, Propagation, and Aggregation (ECAPA-TDNN) pre-trained on English data, model trained directly on Russian data and pre-trained model additionally trained on Russian data and model with some improvements in TDNN architecture.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信