{"title":"俄语说话者识别的神经网络模型","authors":"Andrey Popov, Sergey A. Ivanov","doi":"10.1109/ITQMIS53292.2021.9642756","DOIUrl":null,"url":null,"abstract":"Speaker recognition is a form of biometrics that treats a person's voice as a unique biological characteristic for identification, verification or speakers diarisation. Speech recognition usually used acoustic features of human speech. These acoustic patterns reflect both the anatomy and learned behavioral patterns of the individual. Present work has shown a performance of state of art approaches for speaker recognition on dataset of Russian voices. We have compared a performance of Time Delay Neural Network model based on Emphasized Channel Attention, Propagation, and Aggregation (ECAPA-TDNN) pre-trained on English data, model trained directly on Russian data and pre-trained model additionally trained on Russian data and model with some improvements in TDNN architecture.","PeriodicalId":417880,"journal":{"name":"2021 International Conference on Quality Management, Transport and Information Security, Information Technologies (IT&QM&IS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Neural Network Models for Russian Language Speaker Recognition\",\"authors\":\"Andrey Popov, Sergey A. Ivanov\",\"doi\":\"10.1109/ITQMIS53292.2021.9642756\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speaker recognition is a form of biometrics that treats a person's voice as a unique biological characteristic for identification, verification or speakers diarisation. Speech recognition usually used acoustic features of human speech. These acoustic patterns reflect both the anatomy and learned behavioral patterns of the individual. Present work has shown a performance of state of art approaches for speaker recognition on dataset of Russian voices. We have compared a performance of Time Delay Neural Network model based on Emphasized Channel Attention, Propagation, and Aggregation (ECAPA-TDNN) pre-trained on English data, model trained directly on Russian data and pre-trained model additionally trained on Russian data and model with some improvements in TDNN architecture.\",\"PeriodicalId\":417880,\"journal\":{\"name\":\"2021 International Conference on Quality Management, Transport and Information Security, Information Technologies (IT&QM&IS)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Quality Management, Transport and Information Security, Information Technologies (IT&QM&IS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITQMIS53292.2021.9642756\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Quality Management, Transport and Information Security, Information Technologies (IT&QM&IS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITQMIS53292.2021.9642756","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Neural Network Models for Russian Language Speaker Recognition
Speaker recognition is a form of biometrics that treats a person's voice as a unique biological characteristic for identification, verification or speakers diarisation. Speech recognition usually used acoustic features of human speech. These acoustic patterns reflect both the anatomy and learned behavioral patterns of the individual. Present work has shown a performance of state of art approaches for speaker recognition on dataset of Russian voices. We have compared a performance of Time Delay Neural Network model based on Emphasized Channel Attention, Propagation, and Aggregation (ECAPA-TDNN) pre-trained on English data, model trained directly on Russian data and pre-trained model additionally trained on Russian data and model with some improvements in TDNN architecture.