Deep Learning-based Analysis of Voiceprint Data Mining

Jacky Chun-ki Tang
{"title":"Deep Learning-based Analysis of Voiceprint Data Mining","authors":"Jacky Chun-ki Tang","doi":"10.56828/jser.2022.1.1.1","DOIUrl":null,"url":null,"abstract":": In the information age, the intelligent data mining method represented by deep learning is playing an important role in various fields at present. It is necessary to study how to efficiently use the intelligent data mining method to obtain valuable information from massive information. Open-set voiceprint recognition is realized by intelligent data mining technology. Therefore, it is of great practical significance to achieve rapid and accurate identification of the speaker's identity. Because the traditional voiceprint recognition method has insufficient ability to distinguish the speakers inside and outside the set, it often leads to a high false recognition rate. Mining parameters containing more speakers’ personality characteristics and how to calculate the threshold become the bottleneck problems of open set voiceprint recognition. Therefore, this paper adopts the deep confidence network stacked by three layers of restricted Boltzmann machines as the deep acoustic feature extractor. The mel-frequency cepstral coefficients of 24-dimensional basic acoustic features are mapped to 256-dimensional feature space, and the parameters of deep acoustic features containing more speaker's personality characteristics are obtained. Then, an open-set adaptive threshold calculation algorithm is obtained. In this paper, the similarity value of deep acoustic features is calculated by the Gaussian mixture model, and the maximum inter-class variance of the similarity value is calculated by the OTSU algorithm. When the inter-class variance is the maximum, the similarity value is the best threshold. The experimental test shows that the algorithm for calculating threshold based on deep learning proposed in this paper has a lower false rejection rate and lower false rejection rate.","PeriodicalId":13763,"journal":{"name":"International Journal of Applied Science and Engineering Research","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Applied Science and Engineering Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.56828/jser.2022.1.1.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

: In the information age, the intelligent data mining method represented by deep learning is playing an important role in various fields at present. It is necessary to study how to efficiently use the intelligent data mining method to obtain valuable information from massive information. Open-set voiceprint recognition is realized by intelligent data mining technology. Therefore, it is of great practical significance to achieve rapid and accurate identification of the speaker's identity. Because the traditional voiceprint recognition method has insufficient ability to distinguish the speakers inside and outside the set, it often leads to a high false recognition rate. Mining parameters containing more speakers’ personality characteristics and how to calculate the threshold become the bottleneck problems of open set voiceprint recognition. Therefore, this paper adopts the deep confidence network stacked by three layers of restricted Boltzmann machines as the deep acoustic feature extractor. The mel-frequency cepstral coefficients of 24-dimensional basic acoustic features are mapped to 256-dimensional feature space, and the parameters of deep acoustic features containing more speaker's personality characteristics are obtained. Then, an open-set adaptive threshold calculation algorithm is obtained. In this paper, the similarity value of deep acoustic features is calculated by the Gaussian mixture model, and the maximum inter-class variance of the similarity value is calculated by the OTSU algorithm. When the inter-class variance is the maximum, the similarity value is the best threshold. The experimental test shows that the algorithm for calculating threshold based on deep learning proposed in this paper has a lower false rejection rate and lower false rejection rate.
基于深度学习的声纹数据挖掘分析
在信息时代,以深度学习为代表的智能数据挖掘方法目前在各个领域发挥着重要作用。研究如何有效地利用智能数据挖掘方法,从海量信息中获取有价值的信息是十分必要的。采用智能数据挖掘技术实现开放集声纹识别。因此,实现对说话人身份的快速准确识别具有重要的现实意义。由于传统声纹识别方法对会场内外说话人的区分能力不足,往往会导致较高的误识别率。挖掘包含更多说话人个性特征的参数以及如何计算阈值成为开放集声纹识别的瓶颈问题。因此,本文采用由三层受限玻尔兹曼机堆叠的深度置信网络作为深度声学特征提取器。将24维基本声学特征的梅尔频倒谱系数映射到256维特征空间,得到包含更多说话人个性特征的深层声学特征参数。然后,给出了一种开集自适应阈值计算算法。本文采用高斯混合模型计算深层声学特征的相似值,并采用OTSU算法计算相似值的最大类间方差。当类间方差最大时,相似性值为最佳阈值。实验测试表明,本文提出的基于深度学习的阈值计算算法具有较低的误拒率和较低的误拒率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信