{"title":"Speaker information from subband energies of Linear Prediction residual","authors":"D. Pati, S. Prasanna","doi":"10.1109/NCC.2010.5430209","DOIUrl":null,"url":null,"abstract":"The objective of this work is to demonstrate the significant speaker information present in the subband energies of the Linear Prediction (LP) residual. The LP residual mostly contains the excitation source information. The subband energies extracted using the mel filterbank followed by cepstral analysis provides a compact representation. The resulting cepstral values are termed as Residual-mel Frequency Cepstral Coefficients (R-MFCC). The speaker identification studies conducted using R-MFCC as features and Gaussian mixture model (GMM) on a subset of 30 speakers from NIST-1999 provides 87% accuracy. The performance using MFCC extracted directly from speech provides 87% accuracy. Further, the combination of the two provides 90% accuracy indicating the different aspect of speaker information present in R-MFCC.","PeriodicalId":130953,"journal":{"name":"2010 National Conference On Communications (NCC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 National Conference On Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2010.5430209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25
Abstract
The objective of this work is to demonstrate the significant speaker information present in the subband energies of the Linear Prediction (LP) residual. The LP residual mostly contains the excitation source information. The subband energies extracted using the mel filterbank followed by cepstral analysis provides a compact representation. The resulting cepstral values are termed as Residual-mel Frequency Cepstral Coefficients (R-MFCC). The speaker identification studies conducted using R-MFCC as features and Gaussian mixture model (GMM) on a subset of 30 speakers from NIST-1999 provides 87% accuracy. The performance using MFCC extracted directly from speech provides 87% accuracy. Further, the combination of the two provides 90% accuracy indicating the different aspect of speaker information present in R-MFCC.