{"title":"Capsule Networks based Acoustic Emotion Recognition using Mel Cepstral Features","authors":"Y. Bhanusree, T. V. V. Reddy, S. K. Rao","doi":"10.1109/ICSES52305.2021.9633952","DOIUrl":null,"url":null,"abstract":"Emotions of human plays an important role in personal and work life. Performance at workplace is mostly dependent on the varying emotions and they can be captured through facial expressions, body language and acoustics. Identifying the basic emotions through speech has its own advantages and is progressive these days. Acoustic based emotion recognition can be either linguistic or non-linguistic and the latter is more flexible as it is language independent. Most of the work in this area till date has been done through machine learning algorithms and accuracy is almost compromised. The deep neural networks on the other hand have proven to be achieving more accuracy. The convolution neural networks used for feature extraction has limitations on capturing both temporal and spatial features. Capsule nets is one of the improvised solutions to tackle the situation. The proposed work has used capsule networks with dynamic routing in combination with convlD layer. The proposed model is experimented on RAVDESS, SAVEE, CREMA-D, EMODB, IEMOCAP corpora and is found successful. An improved test accuracy has been achieved on every data corpus.","PeriodicalId":6777,"journal":{"name":"2021 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES)","volume":"31 1","pages":"1-7"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSES52305.2021.9633952","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Emotions of human plays an important role in personal and work life. Performance at workplace is mostly dependent on the varying emotions and they can be captured through facial expressions, body language and acoustics. Identifying the basic emotions through speech has its own advantages and is progressive these days. Acoustic based emotion recognition can be either linguistic or non-linguistic and the latter is more flexible as it is language independent. Most of the work in this area till date has been done through machine learning algorithms and accuracy is almost compromised. The deep neural networks on the other hand have proven to be achieving more accuracy. The convolution neural networks used for feature extraction has limitations on capturing both temporal and spatial features. Capsule nets is one of the improvised solutions to tackle the situation. The proposed work has used capsule networks with dynamic routing in combination with convlD layer. The proposed model is experimented on RAVDESS, SAVEE, CREMA-D, EMODB, IEMOCAP corpora and is found successful. An improved test accuracy has been achieved on every data corpus.