A novel approach of speech emotion recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages

Amiya Samantaray, K. Mahapatra, Bibek Kabi, A. Routray
{"title":"A novel approach of speech emotion recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages","authors":"Amiya Samantaray, K. Mahapatra, Bibek Kabi, A. Routray","doi":"10.1109/ReTIS.2015.7232907","DOIUrl":null,"url":null,"abstract":"Speech emotion recognition is one of the recent challenges in speech processing and Human Computer Interaction (HCI) in order to address various operational needs for the real world applications. Besides human facial expressions, speech has been proven to be one of the most precious modalities for automatic recognition of human emotions. Speech is a spontaneous medium of perceiving emotions which provides in-depth information related to different cognitive states of a human being. In this context, a novel approach is being introduces using a combination of prosody features (i.e. pitch, energy, Zero crossing rate), quality features (i.e. Formant Frequencies, Spectral features etc.), derived features (i.e. Mel-Frequency Cepstral Coefficient (MFCC), Linear Predictive Coding Coefficients (LPCC)) and dynamic feature (Mel-Energy spectrum dynamic Coefficients (MEDC)) for robust automatic recognition of speaker's state of emotion. Multilevel SVM classifier is used for identification of seven discrete emotional states namely anger, disgust, fear, happy, neutral, sad and surprise in `Five native Assamese Languages'. The overall results of the conducted experiments revealed that the approach of using the combination of features achieved an average accuracy rate of 82.26% for speaker independent cases.","PeriodicalId":161306,"journal":{"name":"2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ReTIS.2015.7232907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

Speech emotion recognition is one of the recent challenges in speech processing and Human Computer Interaction (HCI) in order to address various operational needs for the real world applications. Besides human facial expressions, speech has been proven to be one of the most precious modalities for automatic recognition of human emotions. Speech is a spontaneous medium of perceiving emotions which provides in-depth information related to different cognitive states of a human being. In this context, a novel approach is being introduces using a combination of prosody features (i.e. pitch, energy, Zero crossing rate), quality features (i.e. Formant Frequencies, Spectral features etc.), derived features (i.e. Mel-Frequency Cepstral Coefficient (MFCC), Linear Predictive Coding Coefficients (LPCC)) and dynamic feature (Mel-Energy spectrum dynamic Coefficients (MEDC)) for robust automatic recognition of speaker's state of emotion. Multilevel SVM classifier is used for identification of seven discrete emotional states namely anger, disgust, fear, happy, neutral, sad and surprise in `Five native Assamese Languages'. The overall results of the conducted experiments revealed that the approach of using the combination of features achieved an average accuracy rate of 82.26% for speaker independent cases.
基于支持向量机分类器的东北语言韵律、质量和衍生特征语音情感识别方法
语音情感识别是语音处理和人机交互(HCI)领域近年来面临的挑战之一,以满足现实世界应用的各种操作需求。除了人类的面部表情,语言已被证明是人类情感自动识别的最宝贵的方式之一。言语是一种自发的感知情绪的媒介,它提供了与人类不同认知状态相关的深度信息。在这种背景下,一种新的方法被引入,该方法使用韵律特征(即音调,能量,零交叉率),质量特征(即峰频率,频谱特征等),衍生特征(即mel -频率倒谱系数(MFCC),线性预测编码系数(LPCC))和动态特征(mel -能量谱动态系数(MEDC))的组合来实现对说话者情绪状态的鲁棒自动识别。多层次SVM分类器用于识别“五种阿萨姆语”中七种离散的情绪状态,即愤怒、厌恶、恐惧、快乐、中性、悲伤和惊讶。实验结果表明,在独立于说话人的情况下,使用特征组合的方法平均准确率达到82.26%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书