Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop最新文献

Multi-modal Continuous Dimensional Emotion Recognition Using Recurrent Neural Network and Self-Attention Mechanism 基于递归神经网络和自注意机制的多模态连续维度情绪识别

Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop Pub Date : 2020-10-16 DOI: 10.1145/3423327.3423672

Licai Sun, Zheng Lian, J. Tao, Bin Liu, Mingyue Niu

{"title":"Multi-modal Continuous Dimensional Emotion Recognition Using Recurrent Neural Network and Self-Attention Mechanism","authors":"Licai Sun, Zheng Lian, J. Tao, Bin Liu, Mingyue Niu","doi":"10.1145/3423327.3423672","DOIUrl":"https://doi.org/10.1145/3423327.3423672","url":null,"abstract":"Automatic perception and understanding of human emotion or sentiment has a wide range of applications and has attracted increasing attention nowadays. The Multimodal Sentiment Analysis in Real-life Media (MuSe) 2020 provides a testing bed for recognizing human emotion or sentiment from multiple modalities (audio, video, and text) in the wild scenario. In this paper, we present our solutions to the MuSe-Wild sub-challenge of MuSe 2020. The goal of this sub-challenge is to perform continuous emotion (arousal and valence) predictions on a car review database, Muse-CaR. To this end, we first extract both handcrafted features and deep representations from multiple modalities. Then, we utilize the Long Short-Term Memory (LSTM) recurrent neural network as well as the self-attention mechanism to model the complex temporal dependencies in the sequence. The Concordance Correlation Coefficient (CCC) loss is employed to guide the model to learn local variations and the global trend of emotion simultaneously. Finally, two fusion strategies, early fusion and late fusion, are adopted to further boost the model's performance by exploiting complementary information from different modalities. Our proposed method achieves CCC of 0.4726 and 0.5996 for arousal and valence respectively on the test set, which outperforms the baseline system with corresponding CCC of 0.2834 and 0.2431.","PeriodicalId":246071,"journal":{"name":"Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124600811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

Extending Multimodal Emotion Recognition with Biological Signals: Presenting a Novel Dataset and Recent Findings 用生物信号扩展多模态情绪识别:呈现一个新的数据集和最新发现

Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop Pub Date : 2020-10-16 DOI: 10.1145/3423327.3423512

Alice Baird

引用次数: 0

Multimodal Social Media Mining 多模式社交媒体挖掘

Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop Pub Date : 2020-10-16 DOI: 10.1145/3423327.3423511

Y. Kompatsiaris

引用次数: 0

Vehicle Interiors as Sensate Environments 汽车内饰作为感知环境

Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop Pub Date : 2020-10-16 DOI: 10.1145/3423327.3423509

M. Würtenberger

引用次数: 0

Multi-modal Fusion for Video Sentiment Analysis 视频情感分析的多模态融合

Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop Pub Date : 2020-10-16 DOI: 10.1145/3423327.3423671

Ruichen Li, Jinming Zhao, Jingwen Hu, Shuai Guo, Qin Jin

{"title":"Multi-modal Fusion for Video Sentiment Analysis","authors":"Ruichen Li, Jinming Zhao, Jingwen Hu, Shuai Guo, Qin Jin","doi":"10.1145/3423327.3423671","DOIUrl":"https://doi.org/10.1145/3423327.3423671","url":null,"abstract":"Automatic sentiment analysis can support revealing a subject's emotional state and opinion tendency toward an entity. In this paper, we present our solutions for the MuSe-Wild sub-challenge of Multimodal Sentiment Analysis in Real-life Media (MuSe) 2020. The videos in this challenge are collected from YouTube about emotional car reviews. In the scenarios, the speaker's sentiment can be conveyed in different modalities including acoustic, visual, and textual modalities. Due to the complementarity of different modalities, the fusion of the multiple modalities has a large impact on sentiment analysis. In this paper, we highlight two aspects of our solutions: 1) we explore various low-level and high-level features from different modalities for emotional state recognition, such as expert-defined low-level descriptors (LLD) and deep learned features, etc. 2) we propose several effective multi-modal fusion strategies to make full use of the different modalities. Our solutions achieve the best CCC performance of 0.4346 and 0.4513 on arousal and valence respectively on the challenge testing set, which significantly outperforms the baseline system with corresponding CCC of 0.2843 and 0.2413 on arousal and valence. The experimental results show that our proposed various effective representations of different modalities and fusion strategies have a strong generalization ability and can bring more robust performance.","PeriodicalId":246071,"journal":{"name":"Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131881803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Personalized Machine Learning for Human-centered Machine Intelligence 以人为中心的机器智能的个性化机器学习

Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop Pub Date : 2020-10-16 DOI: 10.1145/3423327.3423510

Ognjen Rudovic

{"title":"Personalized Machine Learning for Human-centered Machine Intelligence","authors":"Ognjen Rudovic","doi":"10.1145/3423327.3423510","DOIUrl":"https://doi.org/10.1145/3423327.3423510","url":null,"abstract":"Recent developments in AI and Machine Learning (ML) are revolutionizing traditional technologies for health and education by enabling more intelligent therapeutic and learning tools that can automatically perceive and predict user's behavior (e.g. from videos) or health status from user's past clinical data. To date, most of these tools still rely on traditional 'on-size-fits-all' ML paradigm, rendering generic learning algorithms that, in most cases, are suboptimal on the individual level, mainly because of the large heterogeneity of the target population. Furthermore, such approach may provide misleading outcomes as it fails to account for context in which target behaviors/clinical data are being analyzed. This calls for new human-centered machine intelligence enabled by ML algorithms that are tailored to each individual and context under the study. In this talk, I will present the key ideas and applications of Personalized Machine Learning (PML) framework specifically designed to tackle those challenges. The applications range from personalized forecasting of Alzheimer's related cognitive decline, using Gaussian Process models, to Personalized Deep Neural Networks, designed for classification of facial affect of typical individuals using the notion of meta-learning and reinforcement learning. I will then describe in more detail how this framework can be used to tackle a challenging problem of robot perception of affect and engagement in autism therapy. Lastly, I will discuss the future research on PML and human-centered ML design, outlining challenges and opportunities.","PeriodicalId":246071,"journal":{"name":"Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123949369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

AAEC: An Adversarial Autoencoder-based Classifier for Audio Emotion Recognition 基于自编码器的音频情感识别分类器

Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop Pub Date : 2020-10-15 DOI: 10.1145/3423327.3423669

Changzeng Fu, Jiaqi Shi, Chaoran Liu, C. Ishi, H. Ishiguro

{"title":"AAEC: An Adversarial Autoencoder-based Classifier for Audio Emotion Recognition","authors":"Changzeng Fu, Jiaqi Shi, Chaoran Liu, C. Ishi, H. Ishiguro","doi":"10.1145/3423327.3423669","DOIUrl":"https://doi.org/10.1145/3423327.3423669","url":null,"abstract":"In recent years, automatic emotion recognition has attracted the attention of researchers because of its great effects and wide implementations in supporting humans' activities. Given that the data about emotions is difficult to collect and organize into a large database like the dataset of text or images, the true distribution would be difficult to be completely covered by the training set, which affects the model's robustness and generalization in subsequent applications. In this paper, we proposed a model, Adversarial Autoencoder-based Classifier (AAEC), that can not only augment the data within real data distribution but also reasonably extend the boundary of the current data distribution to a possible space. Such an extended space would be better to fit the distribution of training and testing sets. In addition to comparing with baseline models, we modified our proposed model into different configurations and conducted a comprehensive self-comparison with audio modality. The results of our experiment show that our proposed model outperforms the baselines.","PeriodicalId":246071,"journal":{"name":"Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117021144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

MuSe 2020 Challenge and Workshop: Multimodal Sentiment Analysis, Emotion-target Engagement and Trustworthiness Detection in Real-life Media: Emotional Car Reviews in-the-wild 缪斯2020挑战和研讨会:现实媒体中的多模态情感分析、情感目标参与和可信度检测:野外情感汽车评论

Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop Pub Date : 2020-10-15 DOI: 10.1145/3423327.3423673

Lukas Stappen, Alice Baird, Georgios Rizos, Panagiotis Tzirakis, Xinchen Du, Felix Hafner, Lea Schumann, Adria Mallol-Ragolta, Björn Schuller, I. Lefter, E. Cambria, Y. Kompatsiaris

{"title":"MuSe 2020 Challenge and Workshop: Multimodal Sentiment Analysis, Emotion-target Engagement and Trustworthiness Detection in Real-life Media: Emotional Car Reviews in-the-wild","authors":"Lukas Stappen, Alice Baird, Georgios Rizos, Panagiotis Tzirakis, Xinchen Du, Felix Hafner, Lea Schumann, Adria Mallol-Ragolta, Björn Schuller, I. Lefter, E. Cambria, Y. Kompatsiaris","doi":"10.1145/3423327.3423673","DOIUrl":"https://doi.org/10.1145/3423327.3423673","url":null,"abstract":"Multimodal Sentiment Analysis in Real-life Media (MuSe) 2020 is a Challenge-based Workshop focusing on the tasks of sentiment recognition, as well as emotion-target engagement and trustworthiness detection by means of more comprehensively integrating the audio-visual and language modalities. The purpose of MuSe 2020 is to bring together communities from different disciplines; mainly, the audio-visual emotion recognition community (signal-based), and the sentiment analysis community (symbol-based). We present three distinct sub-challenges: MuSe-Wild, which focuses on continuous emotion (arousal and valence) prediction; MuSe-Topic, in which participants recognise 10 domain-specific topics as the target of 3-class (low, medium, high) emotions; and MuSe-Trust, in which the novel aspect of trustworthiness is to be predicted. In this paper, we provide detailed information on MuSe-CAR, the first of its kind in-the-wild database, which is utilised for the challenge, as well as the state-of-the-art features and modelling approaches applied. For each sub-challenge, a competitive baseline for participants is set; namely, on test we report for MuSe-Wild a combined (valence and arousal) CCC of .2568, for MuSe-Topic a score (computed as 0.34 * UAR + 0.66 * F1) of 76.78 % on the 10-class topic and 40.64 % on the 3-class emotion prediction, and for MuSe-Trust a CCC of .4359.","PeriodicalId":246071,"journal":{"name":"Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop","volume":"281 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122942711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Unsupervised Representation Learning with Attention and Sequence to Sequence Autoencoders to Predict Sleepiness From Speech 基于注意和序列到序列自编码器的无监督表示学习预测语音困倦程度

Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop Pub Date : 2020-10-15 DOI: 10.1145/3423327.3423670

S. Amiriparian, Pawel Winokurow, Vincent Karas, Sandra Ottl, Maurice Gerczuk, Björn Schuller

{"title":"Unsupervised Representation Learning with Attention and Sequence to Sequence Autoencoders to Predict Sleepiness From Speech","authors":"S. Amiriparian, Pawel Winokurow, Vincent Karas, Sandra Ottl, Maurice Gerczuk, Björn Schuller","doi":"10.1145/3423327.3423670","DOIUrl":"https://doi.org/10.1145/3423327.3423670","url":null,"abstract":"Motivated by the attention mechanism of the human visual system and recent developments in the field of machine translation, we introduce our attention-based and recurrent sequence to sequence autoencoders for fully unsupervised representation learning from audio files. In particular, we test the efficacy of our novel approach on the task of speech-based sleepiness recognition. We evaluate the learnt representations from both autoencoders, and conduct an early fusion to ascertain possible complementarity between them. In our frameworks, we first extract Mel-spectrograms from raw audio. Second, we train recurrent autoencoders on these spectrograms which are considered as time-dependent frequency vectors. Afterwards, we extract the activations of specific fully connected layers of the autoencoders which represent the learnt features of spectrograms for the corresponding audio instances. Finally, we train support vector regressors on these representations to obtain the predictions. On the development partition of the data, we achieve Spearman's correlation coefficients of .324, .283, and .320 with the targets on the Karolinska Sleepiness Scale by utilising attention and non-attention autoencoders, and the fusion of both autoencoders' representations, respectively. In the same order, we achieve .311, .359, and .367 Spearman's correlation coefficients on the test data, indicating the suitability of our proposed fusion strategy.","PeriodicalId":246071,"journal":{"name":"Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop","volume":"298 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115925521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

End2You End2You

Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop Pub Date : 2020-10-15 DOI: 10.1145/3423327.3423513

Panagiotis Tzirakis

{"title":"End2You","authors":"Panagiotis Tzirakis","doi":"10.1145/3423327.3423513","DOIUrl":"https://doi.org/10.1145/3423327.3423513","url":null,"abstract":"Multimodal profiling is a fundamental component towards a complete interaction between human and machine. This is an important task for intelligent systems as they can automatically sense and adapt their responses according to the human behavior. The last 10 years, several advancements have been accomplished with the use of Deep Neural Networks (DNNs) in several areas including but not limited to affect recognition[1,2]. Convolution and recurrent neural networks are core components of DNNs that have been extensively used to extract robust spatial and temporal features, accordingly. To this end, we introduce End2You[3] an open-source toolkit implemented in Python and based on Tensorflow. It provides capabilities to train and evaluate models in an end-to-end manner, i.e., using raw input. It supports input from raw audio, visual, physiological or other types of information, and the output can be of an arbitrary representation, for either classification or regression tasks. Well known audio- and visual-model implementations are provided including ResNet[4], and MobileNet[5]. It can also capture the temporal dynamics in the signal, utilizing recurrent neural networks such as Long Short-Term Memory (LSTM). The toolkit also provides pretrained unimodal and multimodal models for the emotion recognition task using the RECOLA dataset[6]. To our knowledge, this is the first toolkit that provides generic end-to-end learning for profiling capabilities in either unimodal or multimodal cases. We depict results of the toolkit on the RECOLA dataset and show how it can be used on different datasets.","PeriodicalId":246071,"journal":{"name":"Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115399065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2