基于高斯混合模型的语音情感识别系统及增强GMM的改进

IRA-International Journal of Technology & Engineering Pub Date : 2017-07-10 DOI:10.21013/JTE.ICSESD201706

P. Patel, A. Chaudhari, M. A. Pund, D. Deshmukh

{"title":"基于高斯混合模型的语音情感识别系统及增强GMM的改进","authors":"P. Patel, A. Chaudhari, M. A. Pund, D. Deshmukh","doi":"10.21013/JTE.ICSESD201706","DOIUrl":null,"url":null,"abstract":"Speech emotion recognition is an important issue which affects the human machine interaction. Automatic recognition of human emotion in speech aims at recognizing the underlying emotional state of a speaker from the speech signal. Gaussian mixture models (GMMs) and the minimum error rate classifier (i.e. Bayesian optimal classifier) are popular and effective tools for speech emotion recognition. Typically, GMMs are used to model the class-conditional distributions of acoustic features and their parameters are estimated by the expectation maximization (EM) algorithm based on a training data set. In this paper, we introduce a boosting algorithm for reliably and accurately estimating the class-conditional GMMs. The resulting algorithm is named the Boosted-GMM algorithm. Our speech emotion recognition experiments show that the emotion recognition rates are effectively and significantly boosted by the Boosted-GMM algorithm as compared to the EM-GMM algorithm. During this interaction, human beings have some feelings that they want to convey to their communication partner with whom they are communicating, and then their communication partner may be the human or machine. This work dependent on the emotion recognition of the human beings from their speech signal Emotion recognition from the speaker’s speech is very difficult because of the following reasons: Because of the existence of the different sentences, speakers, speaking styles, speaking rates accosting variability was introduced. The same utterance may show different emotions. Therefore it is very difficult to differentiate these portions of utterance. Another problem is that emotion expression is depending on the speaker and his or her culture and environment. As the culture and environment gets change the speaking style also gets change, which is another challenge in front of the speech emotion recognition system.","PeriodicalId":269688,"journal":{"name":"IRA-International Journal of Technology & Engineering","volume":"91 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Speech Emotion Recognition System Using Gaussian Mixture Model and Improvement proposed via Boosted GMM\",\"authors\":\"P. Patel, A. Chaudhari, M. A. Pund, D. Deshmukh\",\"doi\":\"10.21013/JTE.ICSESD201706\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech emotion recognition is an important issue which affects the human machine interaction. Automatic recognition of human emotion in speech aims at recognizing the underlying emotional state of a speaker from the speech signal. Gaussian mixture models (GMMs) and the minimum error rate classifier (i.e. Bayesian optimal classifier) are popular and effective tools for speech emotion recognition. Typically, GMMs are used to model the class-conditional distributions of acoustic features and their parameters are estimated by the expectation maximization (EM) algorithm based on a training data set. In this paper, we introduce a boosting algorithm for reliably and accurately estimating the class-conditional GMMs. The resulting algorithm is named the Boosted-GMM algorithm. Our speech emotion recognition experiments show that the emotion recognition rates are effectively and significantly boosted by the Boosted-GMM algorithm as compared to the EM-GMM algorithm. During this interaction, human beings have some feelings that they want to convey to their communication partner with whom they are communicating, and then their communication partner may be the human or machine. This work dependent on the emotion recognition of the human beings from their speech signal Emotion recognition from the speaker’s speech is very difficult because of the following reasons: Because of the existence of the different sentences, speakers, speaking styles, speaking rates accosting variability was introduced. The same utterance may show different emotions. Therefore it is very difficult to differentiate these portions of utterance. Another problem is that emotion expression is depending on the speaker and his or her culture and environment. As the culture and environment gets change the speaking style also gets change, which is another challenge in front of the speech emotion recognition system.\",\"PeriodicalId\":269688,\"journal\":{\"name\":\"IRA-International Journal of Technology & Engineering\",\"volume\":\"91 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IRA-International Journal of Technology & Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21013/JTE.ICSESD201706\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IRA-International Journal of Technology & Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21013/JTE.ICSESD201706","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

语音情感识别是影响人机交互的一个重要问题。语音情绪自动识别的目的是从语音信号中识别说话人的潜在情绪状态。高斯混合模型(GMMs)和最小错误率分类器(即贝叶斯最优分类器)是语音情感识别的常用有效工具。一般情况下，GMMs用于声学特征的类条件分布建模，并通过基于训练数据集的期望最大化(EM)算法估计其参数。本文介绍了一种可靠、准确地估计类条件gmm的增强算法。得到的算法被命名为boost - gmm算法。我们的语音情绪识别实验表明，与EM-GMM算法相比，boosting - gmm算法有效且显著地提高了情绪识别率。在这种互动过程中，人类有一些想要传达给交流对象的感受，而他们的交流对象可能是人，也可能是机器。这项工作依赖于人类对其语音信号的情绪识别，从说话人的语音中进行情绪识别是非常困难的，原因如下:由于不同句子的存在，说话人，说话风格，说话速度，以及说话的可变性。同样的话语可能表现出不同的情绪。因此，很难区分这些部分的话语。另一个问题是，情感表达取决于说话者及其文化和环境。随着文化和环境的变化，说话风格也会发生变化，这是语音情感识别系统面临的另一个挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Speech Emotion Recognition System Using Gaussian Mixture Model and Improvement proposed via Boosted GMM

Speech emotion recognition is an important issue which affects the human machine interaction. Automatic recognition of human emotion in speech aims at recognizing the underlying emotional state of a speaker from the speech signal. Gaussian mixture models (GMMs) and the minimum error rate classifier (i.e. Bayesian optimal classifier) are popular and effective tools for speech emotion recognition. Typically, GMMs are used to model the class-conditional distributions of acoustic features and their parameters are estimated by the expectation maximization (EM) algorithm based on a training data set. In this paper, we introduce a boosting algorithm for reliably and accurately estimating the class-conditional GMMs. The resulting algorithm is named the Boosted-GMM algorithm. Our speech emotion recognition experiments show that the emotion recognition rates are effectively and significantly boosted by the Boosted-GMM algorithm as compared to the EM-GMM algorithm. During this interaction, human beings have some feelings that they want to convey to their communication partner with whom they are communicating, and then their communication partner may be the human or machine. This work dependent on the emotion recognition of the human beings from their speech signal Emotion recognition from the speaker’s speech is very difficult because of the following reasons: Because of the existence of the different sentences, speakers, speaking styles, speaking rates accosting variability was introduced. The same utterance may show different emotions. Therefore it is very difficult to differentiate these portions of utterance. Another problem is that emotion expression is depending on the speaker and his or her culture and environment. As the culture and environment gets change the speaking style also gets change, which is another challenge in front of the speech emotion recognition system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IRA-International Journal of Technology & Engineering

自引率

0.00%

发文量