Emotion Detection from Speech Signals using Voting Mechanism on Classified Frames

2019 International Conference on Robotics,Electrical and Signal Processing Techniques (ICREST) Pub Date : 2019-01-01 DOI:10.1109/ICREST.2019.8644168

Adib Ashfaq A. Zamil, Sajib Hasan, Showmik MD. Jannatul Baki, Jawad MD. Adam, Isra Zaman

{"title":"Emotion Detection from Speech Signals using Voting Mechanism on Classified Frames","authors":"Adib Ashfaq A. Zamil, Sajib Hasan, Showmik MD. Jannatul Baki, Jawad MD. Adam, Isra Zaman","doi":"10.1109/ICREST.2019.8644168","DOIUrl":null,"url":null,"abstract":"Understanding human emotion is a complicated task for humans themselves, however, this did not stop the researchers from trying to make machines capable of understanding human emotions. Many approaches have been followed, using speech signals to detect emotions has been popular among these approaches. In this study, Mel Frequency Cepstrum Coefficient (MFCC) features were extracted from speech signals to detect the underlying emotion of the speech. Extracted features were used to classify different emotions using LMT classifier. For each frame of a speech signal, 13-dimensional feature vectors were extracted and Logistic Model Tree (LMT) models were trained using these features. For classifying an unknown speech signal, the 13-dimensional frame features are first extracted from the signal and each frame is classified using the trained model. Using a voting mechanism on the classified frames, the emotion of the speech signal is detected. Experimental results on two datasets- Berlin Database of Emotional Speech (Emo-DB) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) show that our approach works very well in classifying certain emotions while it struggles to discern the differences between some pairs of emotions. Among the trained models, the maximum accuracy achieved was 70% in detecting 7 different emotions. Considering the small dimension size of the feature vectors used, this approach provides an efficient solution to classifying different emotions using speech signals.","PeriodicalId":108842,"journal":{"name":"2019 International Conference on Robotics,Electrical and Signal Processing Techniques (ICREST)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Robotics,Electrical and Signal Processing Techniques (ICREST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICREST.2019.8644168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 44

Abstract

Understanding human emotion is a complicated task for humans themselves, however, this did not stop the researchers from trying to make machines capable of understanding human emotions. Many approaches have been followed, using speech signals to detect emotions has been popular among these approaches. In this study, Mel Frequency Cepstrum Coefficient (MFCC) features were extracted from speech signals to detect the underlying emotion of the speech. Extracted features were used to classify different emotions using LMT classifier. For each frame of a speech signal, 13-dimensional feature vectors were extracted and Logistic Model Tree (LMT) models were trained using these features. For classifying an unknown speech signal, the 13-dimensional frame features are first extracted from the signal and each frame is classified using the trained model. Using a voting mechanism on the classified frames, the emotion of the speech signal is detected. Experimental results on two datasets- Berlin Database of Emotional Speech (Emo-DB) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) show that our approach works very well in classifying certain emotions while it struggles to discern the differences between some pairs of emotions. Among the trained models, the maximum accuracy achieved was 70% in detecting 7 different emotions. Considering the small dimension size of the feature vectors used, this approach provides an efficient solution to classifying different emotions using speech signals.

查看原文本刊更多论文

基于分类帧投票机制的语音信号情感检测

理解人类的情感对人类本身来说是一项复杂的任务，然而，这并没有阻止研究人员试图使机器能够理解人类的情感。人们采用了许多方法，使用语音信号来检测情绪在这些方法中很受欢迎。在本研究中，从语音信号中提取Mel频率倒频谱系数(MFCC)特征来检测语音的潜在情绪。提取的特征使用LMT分类器对不同的情绪进行分类。对语音信号的每一帧提取13维特征向量，并利用这些特征训练Logistic模型树(LMT)模型。为了对未知语音信号进行分类，首先从信号中提取13维帧特征，并使用训练好的模型对每一帧进行分类。利用对分类帧的投票机制，检测语音信号的情感。在柏林情绪语言数据库(Emo-DB)和瑞尔森情绪语言和歌曲视听数据库(RAVDESS)两个数据集上的实验结果表明，我们的方法在对某些情绪进行分类时效果很好，但在辨别某些情绪对之间的差异时却很困难。在训练的模型中，检测7种不同情绪的最高准确率达到70%。考虑到所使用的特征向量的小维大小，该方法为使用语音信号对不同情绪进行分类提供了一种有效的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Conference on Robotics,Electrical and Signal Processing Techniques (ICREST)

自引率

0.00%

发文量