{"title":"一种改进的基于ResNet的藏语说话人识别方法","authors":"Zhenye Gan, Jincheng Li, Ziqian Qu, Yue Yu","doi":"10.1109/epce58798.2023.00015","DOIUrl":null,"url":null,"abstract":"In the past decade, there have been relatively few studies on speaker recognition based on minority languages, especially Tibetan, and most of them use traditional recognition methods. Starting from the Tibetan level, this paper adopts the mainstream neural network learning, and proposes a new Fast ResNet-101 structure by improving the residual network (ResNet) module structure, thus constructing a complete Tibetan speaker recognition system. The improved model is compared and evaluated by two metric learning loss functions, prototype and angle prototype. By introducing Fast ResNet-34 and Fast ResNet-50 models as baselines, the experimental comparison shows that the Fast ResNet-101 model with deeper network structure has the best performance, and the model recognition effect after Angular Prototype metric loss function processing is better. The recognition error rate can reach 3.72 %.","PeriodicalId":355442,"journal":{"name":"2023 2nd Asia Conference on Electrical, Power and Computer Engineering (EPCE)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An improved Tibetan speaker recognition method based on ResNet\",\"authors\":\"Zhenye Gan, Jincheng Li, Ziqian Qu, Yue Yu\",\"doi\":\"10.1109/epce58798.2023.00015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the past decade, there have been relatively few studies on speaker recognition based on minority languages, especially Tibetan, and most of them use traditional recognition methods. Starting from the Tibetan level, this paper adopts the mainstream neural network learning, and proposes a new Fast ResNet-101 structure by improving the residual network (ResNet) module structure, thus constructing a complete Tibetan speaker recognition system. The improved model is compared and evaluated by two metric learning loss functions, prototype and angle prototype. By introducing Fast ResNet-34 and Fast ResNet-50 models as baselines, the experimental comparison shows that the Fast ResNet-101 model with deeper network structure has the best performance, and the model recognition effect after Angular Prototype metric loss function processing is better. The recognition error rate can reach 3.72 %.\",\"PeriodicalId\":355442,\"journal\":{\"name\":\"2023 2nd Asia Conference on Electrical, Power and Computer Engineering (EPCE)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 2nd Asia Conference on Electrical, Power and Computer Engineering (EPCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/epce58798.2023.00015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 2nd Asia Conference on Electrical, Power and Computer Engineering (EPCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/epce58798.2023.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An improved Tibetan speaker recognition method based on ResNet
In the past decade, there have been relatively few studies on speaker recognition based on minority languages, especially Tibetan, and most of them use traditional recognition methods. Starting from the Tibetan level, this paper adopts the mainstream neural network learning, and proposes a new Fast ResNet-101 structure by improving the residual network (ResNet) module structure, thus constructing a complete Tibetan speaker recognition system. The improved model is compared and evaluated by two metric learning loss functions, prototype and angle prototype. By introducing Fast ResNet-34 and Fast ResNet-50 models as baselines, the experimental comparison shows that the Fast ResNet-101 model with deeper network structure has the best performance, and the model recognition effect after Angular Prototype metric loss function processing is better. The recognition error rate can reach 3.72 %.