Jiawei Liu, Chenyang Jin, Jingxi Liang, Luoqi Wang
{"title":"不同型号声纹识别在自动门锁系统中的应用对比(2021年8月)","authors":"Jiawei Liu, Chenyang Jin, Jingxi Liang, Luoqi Wang","doi":"10.1145/3506651.3506660","DOIUrl":null,"url":null,"abstract":"For any system, its reliability and the cost of construction have always been two major determinants of whether it can be used daily. In the field of voiceprint recognition, people are often forced to choose between accuracy and convenience. This paper discusses the performance of two speaker verification models in different environment and whether it is possible to balance between the cost and the result. The Gaussian Mixture Model with universal background model (GMM-UBM) and deep-learning method are selected to represent two common approaches in speaker verification. Through comparison between the two models, we find that the deep-learning method is in greater need of large training datasets to function since it performs poorer than the GMM-UBM model while trained with the same dataset containing only a few samples, while both of these two methods reach nearly 100% accuracy if provided a large enough dataset to train the model. Meanwhile, despite the attempt to yield higher accuracy by configuring the setting of both models, it appears that excellent performance only occurs when large amounts of training data are given, and little noise is present.","PeriodicalId":280080,"journal":{"name":"2021 4th International Conference on Digital Medicine and Image Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of Different Models of Voiceprint Recognition used in Automatic Door Lock System (August 2021)\",\"authors\":\"Jiawei Liu, Chenyang Jin, Jingxi Liang, Luoqi Wang\",\"doi\":\"10.1145/3506651.3506660\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For any system, its reliability and the cost of construction have always been two major determinants of whether it can be used daily. In the field of voiceprint recognition, people are often forced to choose between accuracy and convenience. This paper discusses the performance of two speaker verification models in different environment and whether it is possible to balance between the cost and the result. The Gaussian Mixture Model with universal background model (GMM-UBM) and deep-learning method are selected to represent two common approaches in speaker verification. Through comparison between the two models, we find that the deep-learning method is in greater need of large training datasets to function since it performs poorer than the GMM-UBM model while trained with the same dataset containing only a few samples, while both of these two methods reach nearly 100% accuracy if provided a large enough dataset to train the model. Meanwhile, despite the attempt to yield higher accuracy by configuring the setting of both models, it appears that excellent performance only occurs when large amounts of training data are given, and little noise is present.\",\"PeriodicalId\":280080,\"journal\":{\"name\":\"2021 4th International Conference on Digital Medicine and Image Processing\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 4th International Conference on Digital Medicine and Image Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3506651.3506660\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 4th International Conference on Digital Medicine and Image Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3506651.3506660","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparison of Different Models of Voiceprint Recognition used in Automatic Door Lock System (August 2021)
For any system, its reliability and the cost of construction have always been two major determinants of whether it can be used daily. In the field of voiceprint recognition, people are often forced to choose between accuracy and convenience. This paper discusses the performance of two speaker verification models in different environment and whether it is possible to balance between the cost and the result. The Gaussian Mixture Model with universal background model (GMM-UBM) and deep-learning method are selected to represent two common approaches in speaker verification. Through comparison between the two models, we find that the deep-learning method is in greater need of large training datasets to function since it performs poorer than the GMM-UBM model while trained with the same dataset containing only a few samples, while both of these two methods reach nearly 100% accuracy if provided a large enough dataset to train the model. Meanwhile, despite the attempt to yield higher accuracy by configuring the setting of both models, it appears that excellent performance only occurs when large amounts of training data are given, and little noise is present.