{"title":"Temporal based Emotion Recognition inspired by Activity Recognition models","authors":"Balaganesh Mohan, Mirela C. Popa","doi":"10.1109/aciiw52867.2021.9666356","DOIUrl":null,"url":null,"abstract":"Affective computing is a subset of the larger field of human-computer interaction, having important connections with cognitive processes, influencing the learning process, decision-making and perception. Out of the multiple means of communication, facial expressions are one of the most widely accepted channels for emotion modulation, receiving an increased attention during the last few years. An important aspect, contributing to their recognition success, concerns modeling the temporal dimension. Therefore, this paper aims to investigate the applicability of current state-of-the-art action recognition techniques to the human emotion recognition task. In particular, two different architectures were investigated, a CNN-based model, named Temporal Shift Module (TSM) that can learn spatiotemporal features in 3D data with the computational complexity of a 2D CNN and a video based vision transformer, employing spatio-temporal self attention. The models were trained and tested on the CREMA-D dataset, demonstrating state-of-the-art performance, with a mean class accuracy of 82% and 77% respectively, while outperforming best previous approaches by at least 3.5%.","PeriodicalId":105376,"journal":{"name":"2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/aciiw52867.2021.9666356","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Affective computing is a subset of the larger field of human-computer interaction, having important connections with cognitive processes, influencing the learning process, decision-making and perception. Out of the multiple means of communication, facial expressions are one of the most widely accepted channels for emotion modulation, receiving an increased attention during the last few years. An important aspect, contributing to their recognition success, concerns modeling the temporal dimension. Therefore, this paper aims to investigate the applicability of current state-of-the-art action recognition techniques to the human emotion recognition task. In particular, two different architectures were investigated, a CNN-based model, named Temporal Shift Module (TSM) that can learn spatiotemporal features in 3D data with the computational complexity of a 2D CNN and a video based vision transformer, employing spatio-temporal self attention. The models were trained and tested on the CREMA-D dataset, demonstrating state-of-the-art performance, with a mean class accuracy of 82% and 77% respectively, while outperforming best previous approaches by at least 3.5%.