Efficient feature extraction of speaker identification using phoneme mean F-ratio for Chinese

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI:10.1109/ISCSLP.2012.6423485

Chen Zhao, Hongcui Wang, Songgun Hyon, Jianguo Wei, J. Dang

引用次数: 10

Abstract

The features used for speaker recognition should have more speaker individual information while attenuating the linguistic information. In order to discard the linguistic information effectively, in this paper, we employed the phoneme mean F-ratio method to investigate the different contributions of different frequency region from the point of view of Chinese phoneme, and apply it for speaker identification. It is found that the speaker individual information depending on the phonemes is distributed in different frequency regions of speech sound. Based on the contribution rate, we extracted the new features and combined with GMM model. The experiment for speaker identification task is conducted with a King-ASR Chinese database. Compared with the MFCC feature, the identification error rate with the proposed feature was reduced by 32.94%. The results confirmed that the efficiency of the phoneme mean F-ratio method for improving speaker recognition performance for Chinese.

查看原文本刊更多论文

基于音素平均f比的汉语说话人识别高效特征提取

用于说话人识别的特征应该在衰减语言信息的同时包含更多的说话人个体信息。为了有效地剔除语言信息，本文采用音素均值f比方法，从汉语音素的角度考察不同频率区域的不同贡献，并将其应用于说话人识别。研究发现，说话人的个体信息根据音素分布在语音的不同频率区域。基于贡献率提取新特征，并与GMM模型相结合。使用King-ASR中文数据库进行说话人识别实验。与MFCC特征相比，该特征的识别错误率降低了32.94%。结果证实了音素平均f比法提高汉语说话人识别性能的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 8th International Symposium on Chinese Spoken Language Processing

自引率

0.00%

发文量