Multitask speaker profiling for estimating age, height, weight and smoking habits from spontaneous telephone speech signals

2014 4th International Conference on Computer and Knowledge Engineering (ICCKE) Pub Date : 2014-12-22 DOI:10.1109/ICCKE.2014.6993339

A. H. Poorjam, M. H. Bahari, H. Van hamme

{"title":"Multitask speaker profiling for estimating age, height, weight and smoking habits from spontaneous telephone speech signals","authors":"A. H. Poorjam, M. H. Bahari, H. Van hamme","doi":"10.1109/ICCKE.2014.6993339","DOIUrl":null,"url":null,"abstract":"This paper proposes a novel approach for automatic estimation of four important traits of speakers, namely age, height, weight and smoking habit, from speech signals. In this method, each utterance is modeled using the i-vector framework which is based on the factor analysis on Gaussian Mixture Model (GMM) mean supervectors, and the Non-negative Factor Analysis (NFA) framework which is based on a constrained factor analysis on GMM weights. Then, Artificial Neural Networks (ANNs) and Least Squares Support Vector Regression (LSSVR) are employed to estimate age, height and weight of speakers from given utterances, and ANNs and logistic regression (LR) are utilized to perform smoking habit detection. Since GMM weights provide complementary information to GMM means, a score-level fusion of the i-vector-based and the NFA-based recognizers is considered for age and smoking habit estimation tasks to improve the performance. In addition, a multitask speaker profiling approach is proposed to evaluate the correlated tasks simultaneously and in interaction with each other, and consequently, to boost the accuracy in speaker age, height, weight and smoking habit estimations. To this end, a hybrid architecture involving the score-level fusion of the i-vector-based and the NFA-based recognizers is proposed to exploit the available information in both Gaussian means and Gaussian weights. ANNs are then employed to share the learned information with all tasks while they are learned in parallel. The proposed method is evaluated on telephone speech signals of National Institute for Standards and Technology (NIST) 2008 and 2010 Speaker Recognition Evaluation (SRE) corpora. Experimental results over 1194 utterances show the effectiveness of the proposed method in automatic speaker profiling.","PeriodicalId":152540,"journal":{"name":"2014 4th International Conference on Computer and Knowledge Engineering (ICCKE)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 4th International Conference on Computer and Knowledge Engineering (ICCKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCKE.2014.6993339","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

Abstract

This paper proposes a novel approach for automatic estimation of four important traits of speakers, namely age, height, weight and smoking habit, from speech signals. In this method, each utterance is modeled using the i-vector framework which is based on the factor analysis on Gaussian Mixture Model (GMM) mean supervectors, and the Non-negative Factor Analysis (NFA) framework which is based on a constrained factor analysis on GMM weights. Then, Artificial Neural Networks (ANNs) and Least Squares Support Vector Regression (LSSVR) are employed to estimate age, height and weight of speakers from given utterances, and ANNs and logistic regression (LR) are utilized to perform smoking habit detection. Since GMM weights provide complementary information to GMM means, a score-level fusion of the i-vector-based and the NFA-based recognizers is considered for age and smoking habit estimation tasks to improve the performance. In addition, a multitask speaker profiling approach is proposed to evaluate the correlated tasks simultaneously and in interaction with each other, and consequently, to boost the accuracy in speaker age, height, weight and smoking habit estimations. To this end, a hybrid architecture involving the score-level fusion of the i-vector-based and the NFA-based recognizers is proposed to exploit the available information in both Gaussian means and Gaussian weights. ANNs are then employed to share the learned information with all tasks while they are learned in parallel. The proposed method is evaluated on telephone speech signals of National Institute for Standards and Technology (NIST) 2008 and 2010 Speaker Recognition Evaluation (SRE) corpora. Experimental results over 1194 utterances show the effectiveness of the proposed method in automatic speaker profiling.

查看原文本刊更多论文

从自发电话语音信号中估计年龄、身高、体重和吸烟习惯的多任务说话人分析

本文提出了一种从语音信号中自动估计说话人的年龄、身高、体重和吸烟习惯四种重要特征的新方法。该方法采用基于高斯混合模型(GMM)均值超向量因子分析的i向量框架和基于高斯混合模型(GMM)权值约束因子分析的非负因子分析(NFA)框架对每个话语建模。然后，利用人工神经网络(ann)和最小二乘支持向量回归(LSSVR)从给定的话语中估计说话者的年龄、身高和体重，并利用ann和逻辑回归(LR)进行吸烟习惯检测。由于GMM权重为GMM均值提供了补充信息，因此在年龄和吸烟习惯估计任务中考虑了基于i向量和基于nfa的识别器的分数级融合，以提高性能。此外，本文还提出了一种多任务说话人分析方法，该方法可以同时评估相关任务和相互作用，从而提高对说话人年龄、身高、体重和吸烟习惯估计的准确性。为此，提出了一种基于i向量和基于nfa的识别器的分数级融合的混合体系结构，以利用高斯均值和高斯权值的可用信息。然后使用人工神经网络在并行学习时与所有任务共享学习到的信息。在美国国家标准与技术研究院(NIST) 2008年和2010年语音识别评估(SRE)语料库的电话语音信号上对该方法进行了评估。对1194个语音的实验结果表明了该方法在自动说话人特征分析中的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 4th International Conference on Computer and Knowledge Engineering (ICCKE)

自引率

0.00%

发文量