Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge最新文献

Towards Multimodal Prediction of Time-continuous Emotion using Pose Feature Engineering and a Transformer Encoder 基于姿态特征工程和变压器编码器的时间连续情绪多模态预测

Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge Pub Date : 2022-10-10 DOI: 10.1145/3551876.3554807

Ho-min Park, Ilho Yun, Ajit Kumar, A. Singh, Bong Jun Choi, Dhananjay Singh, W. D. Neve

{"title":"Towards Multimodal Prediction of Time-continuous Emotion using Pose Feature Engineering and a Transformer Encoder","authors":"Ho-min Park, Ilho Yun, Ajit Kumar, A. Singh, Bong Jun Choi, Dhananjay Singh, W. D. Neve","doi":"10.1145/3551876.3554807","DOIUrl":"https://doi.org/10.1145/3551876.3554807","url":null,"abstract":"MuSe-Stress 2022 aims at building sequence regression models for predicting valence and physiological arousal levels of persons who are facing stressful conditions. To that end, audio-visual recordings, transcripts, and physiological signals can be leveraged. In this paper, we describe the approach we developed for Muse-Stress 2022. Specifically, we engineered a new pose feature that captures the movement of human body keypoints. We also trained a Long Short-Term Memory (LSTM) network and a Transformer encoder on different types of feature sequences and different combinations thereof. In addition, we adopted a two-pronged strategy to tune the hyperparameters that govern the different ways the available features can be used. Finally, we made use of late fusion to combine the predictions obtained for the different unimodal features. Our experimental results show that the newly engineered pose feature obtains the second highest development CCC among the seven unimodal features available. Furthermore, our Transformer encoder obtains the highest development CCC for five out of fourteen possible combinations of features and emotion dimensions, with this number increasing from five to nine when performing late fusion. In addition, when searching for optimal hyperparameter settings, our two-pronged hyperparameter tuning strategy leads to noticeable improvements in maximum development CCC, especially when the underlying models are based on an LSTM. In summary, we can conclude that our approach is able to achieve a test CCC of 0.6196 and 0.6351 for arousal and valence, respectively, securing a Top-3 rank in Muse-Stress 2022.","PeriodicalId":434392,"journal":{"name":"Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134182891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Emotional Reaction Analysis based on Multi-Label Graph Convolutional Networks and Dynamic Facial Expression Recognition Transformer 基于多标签图卷积网络和动态面部表情识别转换器的情绪反应分析

Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge Pub Date : 2022-10-10 DOI: 10.1145/3551876.3554810

Kexin Wang, Zheng Lian, Licai Sun, B. Liu, Jianhua Tao, Yin Fan

引用次数: 11

Integrating Cross-modal Interactions via Latent Representation Shift for Multi-modal Humor Detection 基于潜在表征转移的跨模态交互整合多模态幽默检测

Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge Pub Date : 2022-10-10 DOI: 10.1145/3551876.3554805

Chengxin Chen, Pengyuan Zhang

{"title":"Integrating Cross-modal Interactions via Latent Representation Shift for Multi-modal Humor Detection","authors":"Chengxin Chen, Pengyuan Zhang","doi":"10.1145/3551876.3554805","DOIUrl":"https://doi.org/10.1145/3551876.3554805","url":null,"abstract":"Multi-modal sentiment analysis has been an active research area and has attracted increasing attention from multi-disciplinary communities. However, it is still challenging to fuse the information from different modalities in an efficient way. In prior studies, the late fusion strategy has been commonly adopted due to its simplicity and efficacy. Unfortunately, it failed to model the interactions across different modalities. In this paper, we propose a transformer-based hierarchical framework to effectively model both the intrinsic semantics and cross-modal interactions of the relevant modalities. Specifically, the features from each modality are first encoded via standard transformers. Later, the cross-modal interactions from one modality to other modalities are calculated using cross-modal transformers. The derived intrinsic semantics and cross-modal interactions are used to determine the latent representation shift of a particular modality. We evaluate the proposed approach on the MuSe-Humor sub-challenge of Multi-modal Sentiment Analysis Challenge (MuSe) 2022. Experimental results show that an Area Under the Curve (AUC) of 0.9065 can be achieved on the test set of MuSe-Humor. With the promising results, our best submission ranked first place in the sub-challenge.","PeriodicalId":434392,"journal":{"name":"Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130753266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Bridging the Gap: End-to-End Domain Adaptation for Emotional Vocalization Classification using Adversarial Learning 弥合差距:使用对抗性学习的情绪发声分类的端到端领域适应

Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge Pub Date : 2022-10-10 DOI: 10.1145/3551876.3554816

Dominik Schiller, Silvan Mertes, P. V. Rijn, E. André

引用次数: 0

ViPER 毒蛇

Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge Pub Date : 2022-10-10 DOI: 10.1145/3551876.3554806

L. Vaiani, Moreno La Quatra, Luca Cagliero, P. Garza

引用次数: 11

A Personalised Approach to Audiovisual Humour Recognition and its Individual-level Fairness 视听幽默识别的个性化方法及其个体层面的公平性

Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge Pub Date : 2022-10-10 DOI: 10.1145/3551876.3554800

Alexander Kathan, S. Amiriparian, Lukas Christ, Andreas Triantafyllopoulos, Niklas Müller, Andreas König, B. Schuller

{"title":"A Personalised Approach to Audiovisual Humour Recognition and its Individual-level Fairness","authors":"Alexander Kathan, S. Amiriparian, Lukas Christ, Andreas Triantafyllopoulos, Niklas Müller, Andreas König, B. Schuller","doi":"10.1145/3551876.3554800","DOIUrl":"https://doi.org/10.1145/3551876.3554800","url":null,"abstract":"Humour is one of the most subtle and contextualised behavioural patterns to study in social psychology and has a major impact on human emotions, social cognition, behaviour, and relations. Consequently, an automatic understanding of humour is crucial and challenging for a naturalistic human-robot interaction. Recent artificial intelligence (AI)-based methods have shown progress in multimodal humour recognition. However, such methods lack a mechanism in adapting to each individual's characteristics, resulting in a decreased performance, e.g., due to different facial expressions. Further, these models are faced with generalisation problems when being applied for recognition of different styles of humour. We aim to address these challenges by introducing a novel multimodal humour recognition approach in which the models are personalised for each individual in the Passau Spontaneous Football Coach Humour (Passau-SFCH) dataset. We begin by training a model on all individuals in the dataset. Subsequently, we fine-tune all layers of this model with the data from each individual. Finally, we use these models for the prediction task. Using the proposed personalised models, it is possible to significantly (two-tailed t-test, p < 0.05) outperform the non-personalised models. In particular, the mean Area Under the Curve (AUC) is increased from .7573 to .7731 for the audio modality, and from .9203 to .9256 for the video modality. In addition, we apply a weighted late fusion approach which increases the overall performance to an AUC of .9308, demonstrating the complementarity of the features. Finally, we evaluate the individual-level fairness of our approach and show which group of subjects benefits most of using personalisation.","PeriodicalId":434392,"journal":{"name":"Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134494823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

The Dos and Don'ts of Affect Analysis 影响分析的注意事项

Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge Pub Date : 2022-10-10 DOI: 10.1145/3551876.3554815

S. Amiriparian

{"title":"The Dos and Don'ts of Affect Analysis","authors":"S. Amiriparian","doi":"10.1145/3551876.3554815","DOIUrl":"https://doi.org/10.1145/3551876.3554815","url":null,"abstract":"As an inseparable and crucial component of communication affects play a substantial role in human-device and human-human interaction. They convey information about a person's specific traits and states [1, 4, 5], how one feels about the aims of a conversation, the trustworthiness of one's verbal communication [3], and the degree of adaptation in interpersonal speech [2]. This multifaceted nature of human affects poses a great challenge when it comes to applying machine learning systems for their automatic recognition and understanding. Contemporary self-supervised learning architectures such as Transformers, which define state-of-the-art (SOTA) in this area, have shown noticeable deficits in terms of explainability, while more conventional, non-deep machine learning methods, which provide more transparency, often fall (far) behind SOTA systems. So, is it possible to get the best of these two 'worlds'? And more importantly, at what price? In this talk, I provide a set of Dos and Don'ts guidelines for addressing affective computing tasks w. r. t. (i) preserving privacy for affective data and individuals/groups, (ii) being efficient in computing such data in a transparent way, (iii) ensuring reproducibility of the results, (iv) knowing the differences between causation and correlation, and (v) properly applying social and ethical protocols.","PeriodicalId":434392,"journal":{"name":"Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127922781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Uncovering the Nuanced Structure of Expressive Behavior Across Modalities 揭示跨模态表达行为的细微结构

Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge Pub Date : 2022-10-10 DOI: 10.1145/3551876.3554814

Alan S. Cowen

{"title":"Uncovering the Nuanced Structure of Expressive Behavior Across Modalities","authors":"Alan S. Cowen","doi":"10.1145/3551876.3554814","DOIUrl":"https://doi.org/10.1145/3551876.3554814","url":null,"abstract":"Guided by semantic space theory, large-scale computational studies have advanced our understanding of the structure and function of expressive behavior. I will integrate findings from experimental studies of facial expression (N=19,656), vocal bursts (N=12,616), speech prosody (N=20,109), multimodal reactions (N=8,056), and an ongoing study of dyadic interactions (N=1,000+). These studies combine methods from psychology and computer science to yield new insights into what expressive behaviors signal, how they are perceived, and how they shape social interaction. Using machine learning to extract cross-cultural dimensions of behavior while minimizing biases due to demographics and context, we arrive at objective measures of the structural dimensions that make up human expression. Expressions are consistently found to be high-dimensional and blended, with their meaning across cultures being efficiently conceptualized in terms of a wide range of specific emotion concepts. Altogether, these findings generate a comprehensive new atlas of expressive behavior, which I will explore through a variety of visualizations. This new taxonomy departs from models such as the basic six and affective circumplex, suggesting a new way forward for expression understanding and sentiment analysis.","PeriodicalId":434392,"journal":{"name":"Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge","volume":"71 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114157384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multimodal Temporal Attention in Sentiment Analysis 情感分析中的多模态时间注意

Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge Pub Date : 2022-10-10 DOI: 10.1145/3551876.3554811

Yu He, Licai Sun, Zheng Lian, B. Liu, Jianhua Tao, Meng Wang, Yuan Cheng

{"title":"Multimodal Temporal Attention in Sentiment Analysis","authors":"Yu He, Licai Sun, Zheng Lian, B. Liu, Jianhua Tao, Meng Wang, Yuan Cheng","doi":"10.1145/3551876.3554811","DOIUrl":"https://doi.org/10.1145/3551876.3554811","url":null,"abstract":"In this paper, we present the solution to the MuSe-Stress sub-challenge in the MuSe 2022 Multimodal Sentiment Analysis Challenge. The task of MuSe-Stress is to predict a time-continuous value (i.e., physiological arousal and valence) based on multimodal data of audio, visual, text, and physiological signals. In this competition, we find that multimodal fusion has good performance for physiological arousal on the validation set, but poor prediction performance on the test set. We believe that problem may be due to the over-fitting caused by the model's over-reliance on some specific modal features. To deal with the above problem, we propose Multimodal Temporal Attention (MMTA), which considers the temporal effects of all modalities on each unimodal branch, realizing the interaction between unimodal branches and adaptive inter-modal balance. The concordance correlation coefficient (CCC) of physiological arousal and valence are 0.6818 with MMTA and 0.6841 with early fusion, respectively, both ranking Top 1, outperforming the baseline system by a large margin (i.e., 0.4761 and 0.4931) on the test set.","PeriodicalId":434392,"journal":{"name":"Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128276502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Comparing Biosignal and Acoustic feature Representation for Continuous Emotion Recognition 生物信号和声学特征表示在连续情绪识别中的比较

Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge Pub Date : 2022-10-10 DOI: 10.1145/3551876.3554812

Sarthak Yadav, Tilak Purohit, Z. Mostaani, Bogdan Vlasenko, M. Magimai.-Doss

{"title":"Comparing Biosignal and Acoustic feature Representation for Continuous Emotion Recognition","authors":"Sarthak Yadav, Tilak Purohit, Z. Mostaani, Bogdan Vlasenko, M. Magimai.-Doss","doi":"10.1145/3551876.3554812","DOIUrl":"https://doi.org/10.1145/3551876.3554812","url":null,"abstract":"Automatic recognition of human emotion has a wide range of applications. Human emotions can be identified across different modalities, such as biosignal, speech, text, and mimics. This paper is focusing on time-continuous prediction of level of valence and psycho-physiological arousal. In that regard, we investigate, (a) the use of different feature embeddings obtained from neural networks pre-trained on different speech tasks (e.g., phone classification, speech emotion recognition) and self-supervised neural networks, (b) estimation of arousal and valence from physiological signals in an end-to-end manner and (c) combining different neural embeddings. Our investigations on the MuSe-Stress sub-challenge shows that (a) the embeddings extracted from physiological signals using CNNs trained in an end-to-end manner improves over the baseline approach of modeling physiological signals, (b) neural embeddings obtained from phone classification neural network and speech emotion recognition neural network trained on auxiliary language data sets yield improvement over baseline systems purely trained on the target data, and (c) task-specific neural embeddings yield improved performance over self-supervised neural embeddings for both arousal and valence. Our best performing system on test-set surpass the DeepSpectrum baseline (combined score) by a relative 7.7% margin","PeriodicalId":434392,"journal":{"name":"Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge","volume":"11 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125764651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4