从子带能量中提取说话人信息的线性预测残差

2010 National Conference On Communications (NCC) Pub Date : 2010-03-15 DOI:10.1109/NCC.2010.5430209

D. Pati, S. Prasanna

{"title":"从子带能量中提取说话人信息的线性预测残差","authors":"D. Pati, S. Prasanna","doi":"10.1109/NCC.2010.5430209","DOIUrl":null,"url":null,"abstract":"The objective of this work is to demonstrate the significant speaker information present in the subband energies of the Linear Prediction (LP) residual. The LP residual mostly contains the excitation source information. The subband energies extracted using the mel filterbank followed by cepstral analysis provides a compact representation. The resulting cepstral values are termed as Residual-mel Frequency Cepstral Coefficients (R-MFCC). The speaker identification studies conducted using R-MFCC as features and Gaussian mixture model (GMM) on a subset of 30 speakers from NIST-1999 provides 87% accuracy. The performance using MFCC extracted directly from speech provides 87% accuracy. Further, the combination of the two provides 90% accuracy indicating the different aspect of speaker information present in R-MFCC.","PeriodicalId":130953,"journal":{"name":"2010 National Conference On Communications (NCC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Speaker information from subband energies of Linear Prediction residual\",\"authors\":\"D. Pati, S. Prasanna\",\"doi\":\"10.1109/NCC.2010.5430209\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The objective of this work is to demonstrate the significant speaker information present in the subband energies of the Linear Prediction (LP) residual. The LP residual mostly contains the excitation source information. The subband energies extracted using the mel filterbank followed by cepstral analysis provides a compact representation. The resulting cepstral values are termed as Residual-mel Frequency Cepstral Coefficients (R-MFCC). The speaker identification studies conducted using R-MFCC as features and Gaussian mixture model (GMM) on a subset of 30 speakers from NIST-1999 provides 87% accuracy. The performance using MFCC extracted directly from speech provides 87% accuracy. Further, the combination of the two provides 90% accuracy indicating the different aspect of speaker information present in R-MFCC.\",\"PeriodicalId\":130953,\"journal\":{\"name\":\"2010 National Conference On Communications (NCC)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-03-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 National Conference On Communications (NCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NCC.2010.5430209\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 National Conference On Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2010.5430209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

摘要

这项工作的目的是证明在线性预测(LP)残差的子带能量中存在重要的说话人信息。低频残差主要包含激励源信息。使用mel滤波器组提取子带能量，然后进行倒谱分析，提供了一个紧凑的表示。由此产生的倒谱值称为残余频率倒谱系数(R-MFCC)。使用R-MFCC作为特征和高斯混合模型(GMM)在NIST-1999的30个说话人子集上进行的说话人识别研究提供了87%的准确率。使用直接从语音中提取的MFCC，准确率达到87%。此外，两者的结合提供了90%的准确率，表明R-MFCC中存在的说话人信息的不同方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Speaker information from subband energies of Linear Prediction residual

The objective of this work is to demonstrate the significant speaker information present in the subband energies of the Linear Prediction (LP) residual. The LP residual mostly contains the excitation source information. The subband energies extracted using the mel filterbank followed by cepstral analysis provides a compact representation. The resulting cepstral values are termed as Residual-mel Frequency Cepstral Coefficients (R-MFCC). The speaker identification studies conducted using R-MFCC as features and Gaussian mixture model (GMM) on a subset of 30 speakers from NIST-1999 provides 87% accuracy. The performance using MFCC extracted directly from speech provides 87% accuracy. Further, the combination of the two provides 90% accuracy indicating the different aspect of speaker information present in R-MFCC.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 National Conference On Communications (NCC)

自引率

0.00%

发文量