不同型号声纹识别在自动门锁系统中的应用对比(2021年8月)

2021 4th International Conference on Digital Medicine and Image Processing Pub Date : 2021-11-12 DOI:10.1145/3506651.3506660

Jiawei Liu, Chenyang Jin, Jingxi Liang, Luoqi Wang

{"title":"不同型号声纹识别在自动门锁系统中的应用对比(2021年8月)","authors":"Jiawei Liu, Chenyang Jin, Jingxi Liang, Luoqi Wang","doi":"10.1145/3506651.3506660","DOIUrl":null,"url":null,"abstract":"For any system, its reliability and the cost of construction have always been two major determinants of whether it can be used daily. In the field of voiceprint recognition, people are often forced to choose between accuracy and convenience. This paper discusses the performance of two speaker verification models in different environment and whether it is possible to balance between the cost and the result. The Gaussian Mixture Model with universal background model (GMM-UBM) and deep-learning method are selected to represent two common approaches in speaker verification. Through comparison between the two models, we find that the deep-learning method is in greater need of large training datasets to function since it performs poorer than the GMM-UBM model while trained with the same dataset containing only a few samples, while both of these two methods reach nearly 100% accuracy if provided a large enough dataset to train the model. Meanwhile, despite the attempt to yield higher accuracy by configuring the setting of both models, it appears that excellent performance only occurs when large amounts of training data are given, and little noise is present.","PeriodicalId":280080,"journal":{"name":"2021 4th International Conference on Digital Medicine and Image Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of Different Models of Voiceprint Recognition used in Automatic Door Lock System (August 2021)\",\"authors\":\"Jiawei Liu, Chenyang Jin, Jingxi Liang, Luoqi Wang\",\"doi\":\"10.1145/3506651.3506660\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For any system, its reliability and the cost of construction have always been two major determinants of whether it can be used daily. In the field of voiceprint recognition, people are often forced to choose between accuracy and convenience. This paper discusses the performance of two speaker verification models in different environment and whether it is possible to balance between the cost and the result. The Gaussian Mixture Model with universal background model (GMM-UBM) and deep-learning method are selected to represent two common approaches in speaker verification. Through comparison between the two models, we find that the deep-learning method is in greater need of large training datasets to function since it performs poorer than the GMM-UBM model while trained with the same dataset containing only a few samples, while both of these two methods reach nearly 100% accuracy if provided a large enough dataset to train the model. Meanwhile, despite the attempt to yield higher accuracy by configuring the setting of both models, it appears that excellent performance only occurs when large amounts of training data are given, and little noise is present.\",\"PeriodicalId\":280080,\"journal\":{\"name\":\"2021 4th International Conference on Digital Medicine and Image Processing\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 4th International Conference on Digital Medicine and Image Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3506651.3506660\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 4th International Conference on Digital Medicine and Image Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3506651.3506660","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

对于任何系统来说，其可靠性和建造成本一直是决定其能否日常使用的两个主要因素。在声纹识别领域，人们常常被迫在准确性和方便性之间做出选择。本文讨论了两种说话人验证模型在不同环境下的性能，以及是否有可能在成本和结果之间取得平衡。选择具有通用背景模型的高斯混合模型(GMM-UBM)和深度学习方法作为两种常用的说话人验证方法。通过两种模型的比较，我们发现深度学习方法更需要大型的训练数据集来发挥作用，因为在只包含少量样本的相同数据集上训练时，深度学习方法的性能比GMM-UBM模型差，而如果提供足够大的数据集来训练模型，这两种方法都可以达到接近100%的准确率。同时，尽管试图通过配置两种模型的设置来获得更高的准确性，但似乎只有在提供大量训练数据并且存在少量噪声的情况下才会出现优异的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparison of Different Models of Voiceprint Recognition used in Automatic Door Lock System (August 2021)

For any system, its reliability and the cost of construction have always been two major determinants of whether it can be used daily. In the field of voiceprint recognition, people are often forced to choose between accuracy and convenience. This paper discusses the performance of two speaker verification models in different environment and whether it is possible to balance between the cost and the result. The Gaussian Mixture Model with universal background model (GMM-UBM) and deep-learning method are selected to represent two common approaches in speaker verification. Through comparison between the two models, we find that the deep-learning method is in greater need of large training datasets to function since it performs poorer than the GMM-UBM model while trained with the same dataset containing only a few samples, while both of these two methods reach nearly 100% accuracy if provided a large enough dataset to train the model. Meanwhile, despite the attempt to yield higher accuracy by configuring the setting of both models, it appears that excellent performance only occurs when large amounts of training data are given, and little noise is present.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 4th International Conference on Digital Medicine and Image Processing

自引率

0.00%

发文量