T. Bocklet, A. Maier, Josef G. Bauer, F. Burkhardt, E. Nöth
{"title":"基于GMM超向量和支持向量机的电话应用年龄和性别识别","authors":"T. Bocklet, A. Maier, Josef G. Bauer, F. Burkhardt, E. Nöth","doi":"10.1109/ICASSP.2008.4517932","DOIUrl":null,"url":null,"abstract":"This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) with universal background models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"119","resultStr":"{\"title\":\"Age and gender recognition for telephone applications based on GMM supervectors and support vector machines\",\"authors\":\"T. Bocklet, A. Maier, Josef G. Bauer, F. Burkhardt, E. Nöth\",\"doi\":\"10.1109/ICASSP.2008.4517932\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) with universal background models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.\",\"PeriodicalId\":333742,\"journal\":{\"name\":\"2008 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"119\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2008.4517932\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2008.4517932","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Age and gender recognition for telephone applications based on GMM supervectors and support vector machines
This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) with universal background models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.