Age and gender recognition for telephone applications based on GMM supervectors and support vector machines

2008 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2008-05-12 DOI:10.1109/ICASSP.2008.4517932

T. Bocklet, A. Maier, Josef G. Bauer, F. Burkhardt, E. Nöth

引用次数: 119

Abstract

This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) with universal background models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.

查看原文本刊更多论文

基于GMM超向量和支持向量机的电话应用年龄和性别识别

本文比较了两种7类年龄性别自动分类方法。第一种方法是高斯混合模型(GMMs)和通用背景模型(ubm)，它以说话人识别/验证任务而闻名。分别采用EM算法和MAP自适应算法进行训练。对于第二种方法，对测试和训练集的每个说话者训练一个GMM模型。对每个模型的均值进行提取和连接，得到每个说话人的GMM超向量。然后将这些超向量用于支持向量机(SVM)。支持向量机方法采用了三种不同的核:多项式核(具有不同的多项式)，RBF核和基于KL散度的线性GMM距离核。使用SVM方法，我们将识别率提高到74% (p < 0.001)，并且与人类处于相同的范围内。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

自引率

0.00%

发文量