A novel approach of speech emotion recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages

2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS) Pub Date : 2015-07-09 DOI:10.1109/ReTIS.2015.7232907

Amiya Samantaray, K. Mahapatra, Bibek Kabi, A. Routray

{"title":"A novel approach of speech emotion recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages","authors":"Amiya Samantaray, K. Mahapatra, Bibek Kabi, A. Routray","doi":"10.1109/ReTIS.2015.7232907","DOIUrl":null,"url":null,"abstract":"Speech emotion recognition is one of the recent challenges in speech processing and Human Computer Interaction (HCI) in order to address various operational needs for the real world applications. Besides human facial expressions, speech has been proven to be one of the most precious modalities for automatic recognition of human emotions. Speech is a spontaneous medium of perceiving emotions which provides in-depth information related to different cognitive states of a human being. In this context, a novel approach is being introduces using a combination of prosody features (i.e. pitch, energy, Zero crossing rate), quality features (i.e. Formant Frequencies, Spectral features etc.), derived features (i.e. Mel-Frequency Cepstral Coefficient (MFCC), Linear Predictive Coding Coefficients (LPCC)) and dynamic feature (Mel-Energy spectrum dynamic Coefficients (MEDC)) for robust automatic recognition of speaker's state of emotion. Multilevel SVM classifier is used for identification of seven discrete emotional states namely anger, disgust, fear, happy, neutral, sad and surprise in `Five native Assamese Languages'. The overall results of the conducted experiments revealed that the approach of using the combination of features achieved an average accuracy rate of 82.26% for speaker independent cases.","PeriodicalId":161306,"journal":{"name":"2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ReTIS.2015.7232907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

Speech emotion recognition is one of the recent challenges in speech processing and Human Computer Interaction (HCI) in order to address various operational needs for the real world applications. Besides human facial expressions, speech has been proven to be one of the most precious modalities for automatic recognition of human emotions. Speech is a spontaneous medium of perceiving emotions which provides in-depth information related to different cognitive states of a human being. In this context, a novel approach is being introduces using a combination of prosody features (i.e. pitch, energy, Zero crossing rate), quality features (i.e. Formant Frequencies, Spectral features etc.), derived features (i.e. Mel-Frequency Cepstral Coefficient (MFCC), Linear Predictive Coding Coefficients (LPCC)) and dynamic feature (Mel-Energy spectrum dynamic Coefficients (MEDC)) for robust automatic recognition of speaker's state of emotion. Multilevel SVM classifier is used for identification of seven discrete emotional states namely anger, disgust, fear, happy, neutral, sad and surprise in `Five native Assamese Languages'. The overall results of the conducted experiments revealed that the approach of using the combination of features achieved an average accuracy rate of 82.26% for speaker independent cases.

查看原文本刊更多论文

基于支持向量机分类器的东北语言韵律、质量和衍生特征语音情感识别方法

语音情感识别是语音处理和人机交互(HCI)领域近年来面临的挑战之一，以满足现实世界应用的各种操作需求。除了人类的面部表情，语言已被证明是人类情感自动识别的最宝贵的方式之一。言语是一种自发的感知情绪的媒介，它提供了与人类不同认知状态相关的深度信息。在这种背景下，一种新的方法被引入，该方法使用韵律特征(即音调，能量，零交叉率)，质量特征(即峰频率，频谱特征等)，衍生特征(即mel -频率倒谱系数(MFCC)，线性预测编码系数(LPCC))和动态特征(mel -能量谱动态系数(MEDC))的组合来实现对说话者情绪状态的鲁棒自动识别。多层次SVM分类器用于识别“五种阿萨姆语”中七种离散的情绪状态，即愤怒、厌恶、恐惧、快乐、中性、悲伤和惊讶。实验结果表明，在独立于说话人的情况下，使用特征组合的方法平均准确率达到82.26%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)

自引率

0.00%

发文量