Speaker information from subband energies of Linear Prediction residual

2010 National Conference On Communications (NCC) Pub Date : 2010-03-15 DOI:10.1109/NCC.2010.5430209

D. Pati, S. Prasanna

引用次数: 25

Abstract

The objective of this work is to demonstrate the significant speaker information present in the subband energies of the Linear Prediction (LP) residual. The LP residual mostly contains the excitation source information. The subband energies extracted using the mel filterbank followed by cepstral analysis provides a compact representation. The resulting cepstral values are termed as Residual-mel Frequency Cepstral Coefficients (R-MFCC). The speaker identification studies conducted using R-MFCC as features and Gaussian mixture model (GMM) on a subset of 30 speakers from NIST-1999 provides 87% accuracy. The performance using MFCC extracted directly from speech provides 87% accuracy. Further, the combination of the two provides 90% accuracy indicating the different aspect of speaker information present in R-MFCC.

查看原文本刊更多论文

从子带能量中提取说话人信息的线性预测残差

这项工作的目的是证明在线性预测(LP)残差的子带能量中存在重要的说话人信息。低频残差主要包含激励源信息。使用mel滤波器组提取子带能量，然后进行倒谱分析，提供了一个紧凑的表示。由此产生的倒谱值称为残余频率倒谱系数(R-MFCC)。使用R-MFCC作为特征和高斯混合模型(GMM)在NIST-1999的30个说话人子集上进行的说话人识别研究提供了87%的准确率。使用直接从语音中提取的MFCC，准确率达到87%。此外，两者的结合提供了90%的准确率，表明R-MFCC中存在的说话人信息的不同方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 National Conference On Communications (NCC)

自引率

0.00%

发文量