声音识别中声纹互信息与mel -频率倒谱系数最优数的建模

Kin Wah Edward Lin, Tian Feng, Natalie Agus, C. So, Simon Lui
{"title":"声音识别中声纹互信息与mel -频率倒谱系数最优数的建模","authors":"Kin Wah Edward Lin, Tian Feng, Natalie Agus, C. So, Simon Lui","doi":"10.1109/ICMLA.2014.9","DOIUrl":null,"url":null,"abstract":"In this paper, we study the relationship between the voiceprint and the optimal number of Mel-frequency Cepstral Coefficients (MFCCs). The voiceprint is modelled as sub-MFCCs matrix with the first d number of MFCCs. We model the relationship through information theory and formulate it as the mutual information maximization problem subject to the probabilities constraint. The solution of this optimization problem provides the optimal number of MFCCs, D among these d, which yields the highest classification accuracy of the voice discrimination, together with a confidence level. This study is dictated by the need to understand the use of MFCCs, which have proliferated since its invention to discriminate voice. We evaluate our model by comparing the leave-one-out cross validation (LOOCV) results of usual multi-class classifier, the Supervised Learning Gaussian Mixture Model (SLGMM), with a set of spoken words and A capella solo vocal performances. The experimental results show that our model is a more comprehensive feature selection criteria for the MFCCs than the de-facto technique, LOOCV.","PeriodicalId":109606,"journal":{"name":"2014 13th International Conference on Machine Learning and Applications","volume":"256 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Modelling Mutual Information between Voiceprint and Optimal Number of Mel-Frequency Cepstral Coefficients in Voice Discrimination\",\"authors\":\"Kin Wah Edward Lin, Tian Feng, Natalie Agus, C. So, Simon Lui\",\"doi\":\"10.1109/ICMLA.2014.9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we study the relationship between the voiceprint and the optimal number of Mel-frequency Cepstral Coefficients (MFCCs). The voiceprint is modelled as sub-MFCCs matrix with the first d number of MFCCs. We model the relationship through information theory and formulate it as the mutual information maximization problem subject to the probabilities constraint. The solution of this optimization problem provides the optimal number of MFCCs, D among these d, which yields the highest classification accuracy of the voice discrimination, together with a confidence level. This study is dictated by the need to understand the use of MFCCs, which have proliferated since its invention to discriminate voice. We evaluate our model by comparing the leave-one-out cross validation (LOOCV) results of usual multi-class classifier, the Supervised Learning Gaussian Mixture Model (SLGMM), with a set of spoken words and A capella solo vocal performances. The experimental results show that our model is a more comprehensive feature selection criteria for the MFCCs than the de-facto technique, LOOCV.\",\"PeriodicalId\":109606,\"journal\":{\"name\":\"2014 13th International Conference on Machine Learning and Applications\",\"volume\":\"256 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 13th International Conference on Machine Learning and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2014.9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 13th International Conference on Machine Learning and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2014.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

在本文中,我们研究了声纹与Mel-frequency倒谱系数(MFCCs)的最优数目之间的关系。将声纹建模为具有前d个声纹的子声纹矩阵。利用信息论对二者的关系进行建模,将其表述为受概率约束的互信息最大化问题。该优化问题的解决方案提供了最优的mfc数D,其中D产生了语音识别的最高分类精度,以及置信度。这项研究是由于需要了解mfc的使用,自mfc被发明用于区分语音以来,mfc的使用已经激增。我们通过比较常用的多类分类器——监督学习高斯混合模型(SLGMM)的留一交叉验证(LOOCV)结果,以及一组口语和无伴奏合唱(a - capella)独奏演唱来评估我们的模型。实验结果表明,我们的模型是一个比事实技术LOOCV更全面的mfc特征选择标准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Modelling Mutual Information between Voiceprint and Optimal Number of Mel-Frequency Cepstral Coefficients in Voice Discrimination
In this paper, we study the relationship between the voiceprint and the optimal number of Mel-frequency Cepstral Coefficients (MFCCs). The voiceprint is modelled as sub-MFCCs matrix with the first d number of MFCCs. We model the relationship through information theory and formulate it as the mutual information maximization problem subject to the probabilities constraint. The solution of this optimization problem provides the optimal number of MFCCs, D among these d, which yields the highest classification accuracy of the voice discrimination, together with a confidence level. This study is dictated by the need to understand the use of MFCCs, which have proliferated since its invention to discriminate voice. We evaluate our model by comparing the leave-one-out cross validation (LOOCV) results of usual multi-class classifier, the Supervised Learning Gaussian Mixture Model (SLGMM), with a set of spoken words and A capella solo vocal performances. The experimental results show that our model is a more comprehensive feature selection criteria for the MFCCs than the de-facto technique, LOOCV.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信