基于GMM超向量和支持向量机的电话应用年龄和性别识别

2008 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2008-05-12 DOI:10.1109/ICASSP.2008.4517932

T. Bocklet, A. Maier, Josef G. Bauer, F. Burkhardt, E. Nöth

{"title":"基于GMM超向量和支持向量机的电话应用年龄和性别识别","authors":"T. Bocklet, A. Maier, Josef G. Bauer, F. Burkhardt, E. Nöth","doi":"10.1109/ICASSP.2008.4517932","DOIUrl":null,"url":null,"abstract":"This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) with universal background models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"119","resultStr":"{\"title\":\"Age and gender recognition for telephone applications based on GMM supervectors and support vector machines\",\"authors\":\"T. Bocklet, A. Maier, Josef G. Bauer, F. Burkhardt, E. Nöth\",\"doi\":\"10.1109/ICASSP.2008.4517932\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) with universal background models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.\",\"PeriodicalId\":333742,\"journal\":{\"name\":\"2008 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"119\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2008.4517932\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2008.4517932","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 119

摘要

本文比较了两种7类年龄性别自动分类方法。第一种方法是高斯混合模型(GMMs)和通用背景模型(ubm)，它以说话人识别/验证任务而闻名。分别采用EM算法和MAP自适应算法进行训练。对于第二种方法，对测试和训练集的每个说话者训练一个GMM模型。对每个模型的均值进行提取和连接，得到每个说话人的GMM超向量。然后将这些超向量用于支持向量机(SVM)。支持向量机方法采用了三种不同的核:多项式核(具有不同的多项式)，RBF核和基于KL散度的线性GMM距离核。使用SVM方法，我们将识别率提高到74% (p < 0.001)，并且与人类处于相同的范围内。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Age and gender recognition for telephone applications based on GMM supervectors and support vector machines

This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) with universal background models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

自引率

0.00%

发文量