基于监督分类器集成方法的高效语音情感识别

2020 Emerging Technology in Computing, Communication and Electronics (ETCCE) Pub Date : 2020-12-21 DOI:10.1109/ETCCE51779.2020.9350913

Nasifa Tanjin Ira, Mohammad Osiur Rahman

{"title":"基于监督分类器集成方法的高效语音情感识别","authors":"Nasifa Tanjin Ira, Mohammad Osiur Rahman","doi":"10.1109/ETCCE51779.2020.9350913","DOIUrl":null,"url":null,"abstract":"Speech is the most natural way of expressing ourselves. Speech Emotion Recognition has become a significant research field that is used in many applications nowadays. Main precept of such research is to improve human machine interaction. This research accentuates on recognizing several emotions from audio speech. The work is done by extracting features and using these features in classifying emotions. Eight types of emotions such as neutral, fearful, happy, angry, sad, disgust, calm and surprised have been classified. Features have been extracted using Mel Frequency Cepstral Coefficents (MFCCs). There are six types of supervised classifiers - multilayer perceptron (MLP), Random Forest (RF), AdaBoost, support vector machine (SVM), Gradient Boosting (GB), and Hist Gradient Boosting (HGB) have been used for classification. In this research, a new method for emotion recognition from speech has been proposed. The accuracy rate of MLP, AdaBoost, SVM, Random Forest, Gradient Boosting, and Hist Gradient Boosting are 53%, 32%, 54%, 58%, 56% and 59%, respectively. Further investigation was performed using ensemble method which one is consists of RF, GB and HGB and the obtained accuracy rate is 70%. Precision, Recall and F1-Score were calculated for evaluation of accuracy. Finally, based on a comparative analysis among six classifiers and other existing methods, it is revealed that Ensemble Method is one of the best alternative techniques for emotion recognition from speech.","PeriodicalId":234459,"journal":{"name":"2020 Emerging Technology in Computing, Communication and Electronics (ETCCE)","volume":"274 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"An Efficient Speech Emotion Recognition Using Ensemble Method of Supervised Classifiers\",\"authors\":\"Nasifa Tanjin Ira, Mohammad Osiur Rahman\",\"doi\":\"10.1109/ETCCE51779.2020.9350913\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech is the most natural way of expressing ourselves. Speech Emotion Recognition has become a significant research field that is used in many applications nowadays. Main precept of such research is to improve human machine interaction. This research accentuates on recognizing several emotions from audio speech. The work is done by extracting features and using these features in classifying emotions. Eight types of emotions such as neutral, fearful, happy, angry, sad, disgust, calm and surprised have been classified. Features have been extracted using Mel Frequency Cepstral Coefficents (MFCCs). There are six types of supervised classifiers - multilayer perceptron (MLP), Random Forest (RF), AdaBoost, support vector machine (SVM), Gradient Boosting (GB), and Hist Gradient Boosting (HGB) have been used for classification. In this research, a new method for emotion recognition from speech has been proposed. The accuracy rate of MLP, AdaBoost, SVM, Random Forest, Gradient Boosting, and Hist Gradient Boosting are 53%, 32%, 54%, 58%, 56% and 59%, respectively. Further investigation was performed using ensemble method which one is consists of RF, GB and HGB and the obtained accuracy rate is 70%. Precision, Recall and F1-Score were calculated for evaluation of accuracy. Finally, based on a comparative analysis among six classifiers and other existing methods, it is revealed that Ensemble Method is one of the best alternative techniques for emotion recognition from speech.\",\"PeriodicalId\":234459,\"journal\":{\"name\":\"2020 Emerging Technology in Computing, Communication and Electronics (ETCCE)\",\"volume\":\"274 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Emerging Technology in Computing, Communication and Electronics (ETCCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ETCCE51779.2020.9350913\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Emerging Technology in Computing, Communication and Electronics (ETCCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ETCCE51779.2020.9350913","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

言语是表达自己最自然的方式。语音情感识别已成为当今一个重要的研究领域，并得到了广泛的应用。这种研究的主要原则是提高人机交互。本研究着重于从音频语音中识别几种情绪。这项工作是通过提取特征并使用这些特征对情绪进行分类来完成的。有八种情绪，如中性、恐惧、快乐、愤怒、悲伤、厌恶、平静和惊讶。使用Mel频率倒谱系数(MFCCs)提取特征。有六种类型的监督分类器-多层感知器(MLP)，随机森林(RF)， AdaBoost，支持向量机(SVM)，梯度增强(GB)和历史梯度增强(HGB)已被用于分类。本研究提出了一种基于语音的情感识别新方法。MLP、AdaBoost、SVM、Random Forest、Gradient Boosting和Hist Gradient Boosting的准确率分别为53%、32%、54%、58%、56%和59%。采用由RF、GB和HGB组成的集成方法进行进一步研究，得到的准确率为70%。计算精密度、召回率和F1-Score来评估准确性。最后，通过对六种分类器和其他现有方法的比较分析，揭示了集成方法是语音情感识别的最佳替代技术之一。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Efficient Speech Emotion Recognition Using Ensemble Method of Supervised Classifiers

Speech is the most natural way of expressing ourselves. Speech Emotion Recognition has become a significant research field that is used in many applications nowadays. Main precept of such research is to improve human machine interaction. This research accentuates on recognizing several emotions from audio speech. The work is done by extracting features and using these features in classifying emotions. Eight types of emotions such as neutral, fearful, happy, angry, sad, disgust, calm and surprised have been classified. Features have been extracted using Mel Frequency Cepstral Coefficents (MFCCs). There are six types of supervised classifiers - multilayer perceptron (MLP), Random Forest (RF), AdaBoost, support vector machine (SVM), Gradient Boosting (GB), and Hist Gradient Boosting (HGB) have been used for classification. In this research, a new method for emotion recognition from speech has been proposed. The accuracy rate of MLP, AdaBoost, SVM, Random Forest, Gradient Boosting, and Hist Gradient Boosting are 53%, 32%, 54%, 58%, 56% and 59%, respectively. Further investigation was performed using ensemble method which one is consists of RF, GB and HGB and the obtained accuracy rate is 70%. Precision, Recall and F1-Score were calculated for evaluation of accuracy. Finally, based on a comparative analysis among six classifiers and other existing methods, it is revealed that Ensemble Method is one of the best alternative techniques for emotion recognition from speech.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 Emerging Technology in Computing, Communication and Electronics (ETCCE)

自引率

0.00%

发文量