{"title":"基于监督分类器集成方法的高效语音情感识别","authors":"Nasifa Tanjin Ira, Mohammad Osiur Rahman","doi":"10.1109/ETCCE51779.2020.9350913","DOIUrl":null,"url":null,"abstract":"Speech is the most natural way of expressing ourselves. Speech Emotion Recognition has become a significant research field that is used in many applications nowadays. Main precept of such research is to improve human machine interaction. This research accentuates on recognizing several emotions from audio speech. The work is done by extracting features and using these features in classifying emotions. Eight types of emotions such as neutral, fearful, happy, angry, sad, disgust, calm and surprised have been classified. Features have been extracted using Mel Frequency Cepstral Coefficents (MFCCs). There are six types of supervised classifiers - multilayer perceptron (MLP), Random Forest (RF), AdaBoost, support vector machine (SVM), Gradient Boosting (GB), and Hist Gradient Boosting (HGB) have been used for classification. In this research, a new method for emotion recognition from speech has been proposed. The accuracy rate of MLP, AdaBoost, SVM, Random Forest, Gradient Boosting, and Hist Gradient Boosting are 53%, 32%, 54%, 58%, 56% and 59%, respectively. Further investigation was performed using ensemble method which one is consists of RF, GB and HGB and the obtained accuracy rate is 70%. Precision, Recall and F1-Score were calculated for evaluation of accuracy. Finally, based on a comparative analysis among six classifiers and other existing methods, it is revealed that Ensemble Method is one of the best alternative techniques for emotion recognition from speech.","PeriodicalId":234459,"journal":{"name":"2020 Emerging Technology in Computing, Communication and Electronics (ETCCE)","volume":"274 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"An Efficient Speech Emotion Recognition Using Ensemble Method of Supervised Classifiers\",\"authors\":\"Nasifa Tanjin Ira, Mohammad Osiur Rahman\",\"doi\":\"10.1109/ETCCE51779.2020.9350913\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech is the most natural way of expressing ourselves. Speech Emotion Recognition has become a significant research field that is used in many applications nowadays. Main precept of such research is to improve human machine interaction. This research accentuates on recognizing several emotions from audio speech. The work is done by extracting features and using these features in classifying emotions. Eight types of emotions such as neutral, fearful, happy, angry, sad, disgust, calm and surprised have been classified. Features have been extracted using Mel Frequency Cepstral Coefficents (MFCCs). There are six types of supervised classifiers - multilayer perceptron (MLP), Random Forest (RF), AdaBoost, support vector machine (SVM), Gradient Boosting (GB), and Hist Gradient Boosting (HGB) have been used for classification. In this research, a new method for emotion recognition from speech has been proposed. The accuracy rate of MLP, AdaBoost, SVM, Random Forest, Gradient Boosting, and Hist Gradient Boosting are 53%, 32%, 54%, 58%, 56% and 59%, respectively. Further investigation was performed using ensemble method which one is consists of RF, GB and HGB and the obtained accuracy rate is 70%. Precision, Recall and F1-Score were calculated for evaluation of accuracy. Finally, based on a comparative analysis among six classifiers and other existing methods, it is revealed that Ensemble Method is one of the best alternative techniques for emotion recognition from speech.\",\"PeriodicalId\":234459,\"journal\":{\"name\":\"2020 Emerging Technology in Computing, Communication and Electronics (ETCCE)\",\"volume\":\"274 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Emerging Technology in Computing, Communication and Electronics (ETCCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ETCCE51779.2020.9350913\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Emerging Technology in Computing, Communication and Electronics (ETCCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ETCCE51779.2020.9350913","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Efficient Speech Emotion Recognition Using Ensemble Method of Supervised Classifiers
Speech is the most natural way of expressing ourselves. Speech Emotion Recognition has become a significant research field that is used in many applications nowadays. Main precept of such research is to improve human machine interaction. This research accentuates on recognizing several emotions from audio speech. The work is done by extracting features and using these features in classifying emotions. Eight types of emotions such as neutral, fearful, happy, angry, sad, disgust, calm and surprised have been classified. Features have been extracted using Mel Frequency Cepstral Coefficents (MFCCs). There are six types of supervised classifiers - multilayer perceptron (MLP), Random Forest (RF), AdaBoost, support vector machine (SVM), Gradient Boosting (GB), and Hist Gradient Boosting (HGB) have been used for classification. In this research, a new method for emotion recognition from speech has been proposed. The accuracy rate of MLP, AdaBoost, SVM, Random Forest, Gradient Boosting, and Hist Gradient Boosting are 53%, 32%, 54%, 58%, 56% and 59%, respectively. Further investigation was performed using ensemble method which one is consists of RF, GB and HGB and the obtained accuracy rate is 70%. Precision, Recall and F1-Score were calculated for evaluation of accuracy. Finally, based on a comparative analysis among six classifiers and other existing methods, it is revealed that Ensemble Method is one of the best alternative techniques for emotion recognition from speech.