V. Shrivastava, Vivek Richhariya, Vineet Richhariya
{"title":"Puzzling Out Emotions: A Deep-Learning Approach to Multimodal Sentiment Analysis","authors":"V. Shrivastava, Vivek Richhariya, Vineet Richhariya","doi":"10.1109/ICACAT.2018.8933684","DOIUrl":null,"url":null,"abstract":"Emotions steer both active and passive semantics of human interactions. Precise analysis of these emotions is indispensable to ensure a meaningful communication. Humans, in general, express their emotions in various forms. In order to encompass multiple dimensions of these expressions, this paper proposes a triple-layer (facial, verbal, and vocal) sentiment analysis system based on an application of deep-learning concepts. As such, in our experiment, first we separately examined facial expressions, verbal sentiments and vocal characteristics of a speaker and then mapped the individual results to perform a complete multimodal sentiment analysis. As a part of our two-stage facial expression analysis algorithm, we trained three multi-layer perceptrons using backpropagation technique to recognize a number of action units in human faces and seven single layer perceptrons each to identify one of seven basic human emotions (happiness, sadness, surprise, anger, fear, contempt or disgust, and neutral) expressed by the action units. In our vocal analysis module, we extracted important features (such as, jittering, shimmering, etc.) from sampled audio signals using standard formulae and used those features in a Bayesian Classifier to determine the type of sentiment (positive, negative, or neutral) in the voice. In the final segment of our experiment, we trained seven one dimensional convolutional neural networks to analyze verbal sentiments using the results of vocal analysis module as a bias. We were able to obtain results with as high as 91.80% (training) and 88% (testing) accuracies in our vocal and verbal analysis module; whereas, our facial expression analysis module provided results with 93.71% (training) and 92% (testing) accuracies.","PeriodicalId":6575,"journal":{"name":"2018 International Conference on Advanced Computation and Telecommunication (ICACAT)","volume":"137 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Advanced Computation and Telecommunication (ICACAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACAT.2018.8933684","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Emotions steer both active and passive semantics of human interactions. Precise analysis of these emotions is indispensable to ensure a meaningful communication. Humans, in general, express their emotions in various forms. In order to encompass multiple dimensions of these expressions, this paper proposes a triple-layer (facial, verbal, and vocal) sentiment analysis system based on an application of deep-learning concepts. As such, in our experiment, first we separately examined facial expressions, verbal sentiments and vocal characteristics of a speaker and then mapped the individual results to perform a complete multimodal sentiment analysis. As a part of our two-stage facial expression analysis algorithm, we trained three multi-layer perceptrons using backpropagation technique to recognize a number of action units in human faces and seven single layer perceptrons each to identify one of seven basic human emotions (happiness, sadness, surprise, anger, fear, contempt or disgust, and neutral) expressed by the action units. In our vocal analysis module, we extracted important features (such as, jittering, shimmering, etc.) from sampled audio signals using standard formulae and used those features in a Bayesian Classifier to determine the type of sentiment (positive, negative, or neutral) in the voice. In the final segment of our experiment, we trained seven one dimensional convolutional neural networks to analyze verbal sentiments using the results of vocal analysis module as a bias. We were able to obtain results with as high as 91.80% (training) and 88% (testing) accuracies in our vocal and verbal analysis module; whereas, our facial expression analysis module provided results with 93.71% (training) and 92% (testing) accuracies.