{"title":"Emotion detection with hybrid voice quality and prosodic features using Neural Network","authors":"Inshirah Idris, M. Salam","doi":"10.1109/WICT.2014.7076906","DOIUrl":null,"url":null,"abstract":"This paper investigates the detection of speech emotion using different sets of voice quality, prosodic and hybrid features. There are a total of five sets of emotion features experimented in this work which are two from voice quality features, one set from prosodic features and two hybrid features. The experimental data used in the work is from Berlin Emotional Database. Classification of emotion is done using Multi-Layer Perceptron, Neural Network. The results show that hybrid features gave better overall recognition rates compared to voice quality and prosodic features alone. The best overall recognition of hybrid features is 75.51% while for prosodic and voice quality features are 64.67% and 59.63% respectively. Nevertheless, the recognition performance of emotions are varies with the highest recognition rate is for anger with 88% while the lowest is disgust with only 52% using hybrid features.","PeriodicalId":439852,"journal":{"name":"2014 4th World Congress on Information and Communication Technologies (WICT 2014)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 4th World Congress on Information and Communication Technologies (WICT 2014)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WICT.2014.7076906","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
This paper investigates the detection of speech emotion using different sets of voice quality, prosodic and hybrid features. There are a total of five sets of emotion features experimented in this work which are two from voice quality features, one set from prosodic features and two hybrid features. The experimental data used in the work is from Berlin Emotional Database. Classification of emotion is done using Multi-Layer Perceptron, Neural Network. The results show that hybrid features gave better overall recognition rates compared to voice quality and prosodic features alone. The best overall recognition of hybrid features is 75.51% while for prosodic and voice quality features are 64.67% and 59.63% respectively. Nevertheless, the recognition performance of emotions are varies with the highest recognition rate is for anger with 88% while the lowest is disgust with only 52% using hybrid features.