{"title":"Multimodal Emotion Recognition using Deep Learning Architectures","authors":"Iram Hina, A. Shaukat, M. Akram","doi":"10.1109/ICoDT255437.2022.9787437","DOIUrl":null,"url":null,"abstract":"Emotions are an essential part of immaculate communication. The purpose of this research work is to classify six basic emotions of humans namely anger, disgust, fear, happiness, sadness and surprise. In proposed method a sequential deep convolutional neural network is proposed for audio and visual modality. Audio classification is performed via fine-tuning of a pre-trained AlexNet model whereas, visual classification is performed with a hybrid deep network containing CNN and LSTM. Decision level and score level fusion have been implemented for multimodalities. SVM, random forest, K-NN, and logistic regression classifiers were being used for classifying emotion for fused audio-visual data. Experiments have been performed on RML and BAUM-1s dataset with LOSO and LOSGO cross validation techniques respectively. Recognition rates were extremely positive which shows the validity of the proposed methodology.","PeriodicalId":291030,"journal":{"name":"2022 2nd International Conference on Digital Futures and Transformative Technologies (ICoDT2)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Digital Futures and Transformative Technologies (ICoDT2)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICoDT255437.2022.9787437","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Emotions are an essential part of immaculate communication. The purpose of this research work is to classify six basic emotions of humans namely anger, disgust, fear, happiness, sadness and surprise. In proposed method a sequential deep convolutional neural network is proposed for audio and visual modality. Audio classification is performed via fine-tuning of a pre-trained AlexNet model whereas, visual classification is performed with a hybrid deep network containing CNN and LSTM. Decision level and score level fusion have been implemented for multimodalities. SVM, random forest, K-NN, and logistic regression classifiers were being used for classifying emotion for fused audio-visual data. Experiments have been performed on RML and BAUM-1s dataset with LOSO and LOSGO cross validation techniques respectively. Recognition rates were extremely positive which shows the validity of the proposed methodology.