Puzzling Out Emotions: A Deep-Learning Approach to Multimodal Sentiment Analysis

2018 International Conference on Advanced Computation and Telecommunication (ICACAT) Pub Date : 2018-12-01 DOI:10.1109/ICACAT.2018.8933684

V. Shrivastava, Vivek Richhariya, Vineet Richhariya

{"title":"Puzzling Out Emotions: A Deep-Learning Approach to Multimodal Sentiment Analysis","authors":"V. Shrivastava, Vivek Richhariya, Vineet Richhariya","doi":"10.1109/ICACAT.2018.8933684","DOIUrl":null,"url":null,"abstract":"Emotions steer both active and passive semantics of human interactions. Precise analysis of these emotions is indispensable to ensure a meaningful communication. Humans, in general, express their emotions in various forms. In order to encompass multiple dimensions of these expressions, this paper proposes a triple-layer (facial, verbal, and vocal) sentiment analysis system based on an application of deep-learning concepts. As such, in our experiment, first we separately examined facial expressions, verbal sentiments and vocal characteristics of a speaker and then mapped the individual results to perform a complete multimodal sentiment analysis. As a part of our two-stage facial expression analysis algorithm, we trained three multi-layer perceptrons using backpropagation technique to recognize a number of action units in human faces and seven single layer perceptrons each to identify one of seven basic human emotions (happiness, sadness, surprise, anger, fear, contempt or disgust, and neutral) expressed by the action units. In our vocal analysis module, we extracted important features (such as, jittering, shimmering, etc.) from sampled audio signals using standard formulae and used those features in a Bayesian Classifier to determine the type of sentiment (positive, negative, or neutral) in the voice. In the final segment of our experiment, we trained seven one dimensional convolutional neural networks to analyze verbal sentiments using the results of vocal analysis module as a bias. We were able to obtain results with as high as 91.80% (training) and 88% (testing) accuracies in our vocal and verbal analysis module; whereas, our facial expression analysis module provided results with 93.71% (training) and 92% (testing) accuracies.","PeriodicalId":6575,"journal":{"name":"2018 International Conference on Advanced Computation and Telecommunication (ICACAT)","volume":"137 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Advanced Computation and Telecommunication (ICACAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACAT.2018.8933684","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Emotions steer both active and passive semantics of human interactions. Precise analysis of these emotions is indispensable to ensure a meaningful communication. Humans, in general, express their emotions in various forms. In order to encompass multiple dimensions of these expressions, this paper proposes a triple-layer (facial, verbal, and vocal) sentiment analysis system based on an application of deep-learning concepts. As such, in our experiment, first we separately examined facial expressions, verbal sentiments and vocal characteristics of a speaker and then mapped the individual results to perform a complete multimodal sentiment analysis. As a part of our two-stage facial expression analysis algorithm, we trained three multi-layer perceptrons using backpropagation technique to recognize a number of action units in human faces and seven single layer perceptrons each to identify one of seven basic human emotions (happiness, sadness, surprise, anger, fear, contempt or disgust, and neutral) expressed by the action units. In our vocal analysis module, we extracted important features (such as, jittering, shimmering, etc.) from sampled audio signals using standard formulae and used those features in a Bayesian Classifier to determine the type of sentiment (positive, negative, or neutral) in the voice. In the final segment of our experiment, we trained seven one dimensional convolutional neural networks to analyze verbal sentiments using the results of vocal analysis module as a bias. We were able to obtain results with as high as 91.80% (training) and 88% (testing) accuracies in our vocal and verbal analysis module; whereas, our facial expression analysis module provided results with 93.71% (training) and 92% (testing) accuracies.

查看原文本刊更多论文

困惑情绪:多模态情感分析的深度学习方法

情感控制着人类互动的主动和被动语义。准确分析这些情绪对于确保有意义的交流是必不可少的。一般来说，人类以各种形式表达自己的情感。为了涵盖这些表情的多个维度，本文提出了一个基于深度学习概念应用的三层(面部、语言和声音)情感分析系统。因此，在我们的实验中，首先，我们分别检查了说话者的面部表情、言语情绪和声音特征，然后绘制了单个结果，以执行完整的多模态情绪分析。作为两阶段面部表情分析算法的一部分，我们使用反向传播技术训练了三个多层感知器来识别人脸中的多个动作单元，以及七个单层感知器来识别动作单元所表达的七种基本人类情绪(快乐、悲伤、惊讶、愤怒、恐惧、蔑视或厌恶以及中性)中的一种。在我们的声音分析模块中，我们使用标准公式从采样的音频信号中提取重要特征(例如，抖动，闪烁等)，并在贝叶斯分类器中使用这些特征来确定声音中的情绪类型(积极，消极或中性)。在实验的最后一部分，我们训练了7个一维卷积神经网络，使用声音分析模块的结果作为偏见来分析语言情感。在我们的语音和言语分析模块中，我们能够获得高达91.80%(训练)和88%(测试)准确率的结果;而我们的面部表情分析模块提供的结果准确率为93.71%(训练)和92%(测试)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 International Conference on Advanced Computation and Telecommunication (ICACAT)

自引率

0.00%

发文量