Integrating Complementary Features from Vocal Source and Vocal Tract for Speaker Identification

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2007-09-01 DOI:10.30019/IJCLCLP.200709.0004

Nengheng Zheng, Tan Lee, Ning Wang, P. Ching

{"title":"Integrating Complementary Features from Vocal Source and Vocal Tract for Speaker Identification","authors":"Nengheng Zheng, Tan Lee, Ning Wang, P. Ching","doi":"10.30019/IJCLCLP.200709.0004","DOIUrl":null,"url":null,"abstract":"This paper describes a speaker identification system that uses complementary acoustic features derived from the vocal source excitation and the vocal tract system. Conventional speaker recognition systems typically adopt the cepstral coefficients, e.g., Mel-frequency cepstral coefficients (MFCC) and linear predictive cepstral coefficients (LPCC), as the representative features. The cepstral features aim at characterizing the formant structure of the vocal tract system. This study proposes a new feature set, named the wavelet octave coefficients of residues (WOCOR), to characterize the vocal source excitation signal. WOCOR is derived by wavelet transformation of the linear predictive (LP) residual signal and is capable of capturing the spectro-temporal properties of vocal source excitation. WOCOR and MFCC contain complementary information for speaker recognition since they characterize two physiologically distinct components of speech production. The complementary contributions of MFCC and WOCOR in speaker identification are investigated. A confidence measure based score-level fusion technique is proposed to take full advantage of these two complementary features for speaker identification. Experiments show that an identification system using both MFCC and WOCOR significantly outperforms one using MFCC only. In comparison with the identification error rate of 6.8% obtained with MFCC-based system, an error rate of 4.1% is obtained with the proposed confidence measure based integrating system.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Linguistics Chin. Lang. Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30019/IJCLCLP.200709.0004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

This paper describes a speaker identification system that uses complementary acoustic features derived from the vocal source excitation and the vocal tract system. Conventional speaker recognition systems typically adopt the cepstral coefficients, e.g., Mel-frequency cepstral coefficients (MFCC) and linear predictive cepstral coefficients (LPCC), as the representative features. The cepstral features aim at characterizing the formant structure of the vocal tract system. This study proposes a new feature set, named the wavelet octave coefficients of residues (WOCOR), to characterize the vocal source excitation signal. WOCOR is derived by wavelet transformation of the linear predictive (LP) residual signal and is capable of capturing the spectro-temporal properties of vocal source excitation. WOCOR and MFCC contain complementary information for speaker recognition since they characterize two physiologically distinct components of speech production. The complementary contributions of MFCC and WOCOR in speaker identification are investigated. A confidence measure based score-level fusion technique is proposed to take full advantage of these two complementary features for speaker identification. Experiments show that an identification system using both MFCC and WOCOR significantly outperforms one using MFCC only. In comparison with the identification error rate of 6.8% obtained with MFCC-based system, an error rate of 4.1% is obtained with the proposed confidence measure based integrating system.

查看原文本刊更多论文

基于声源和声道互补特征的说话人识别

本文介绍了一种利用声源激发和声道系统产生的互补声学特征的说话人识别系统。传统的说话人识别系统通常采用Mel-frequency倒谱系数(MFCC)和线性预测倒谱系数(LPCC)等倒谱系数作为代表特征。背侧特征的目的是表征声道系统的形成峰结构。本研究提出了一种新的特征集，称为小波残差系数(WOCOR)来表征声源激励信号。WOCOR是通过对线性预测(LP)残差信号进行小波变换得到的，能够捕捉声源激励的频谱-时间特性。WOCOR和MFCC包含了说话人识别的互补信息，因为它们表征了语音产生的两个生理上不同的组成部分。研究了MFCC和WOCOR在说话人识别中的互补贡献。提出了一种基于置信度的分数级融合技术，充分利用这两种互补特征进行说话人识别。实验表明，同时使用MFCC和WOCOR的识别系统明显优于仅使用MFCC的识别系统。与基于mfc的系统识别错误率为6.8%相比，基于置信度的集成系统识别错误率为4.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Int. J. Comput. Linguistics Chin. Lang. Process.

自引率

0.00%

发文量