多模态情感检测的自关注特征级融合

2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2018-04-10 DOI:10.1109/MIPR.2018.00043

Devamanyu Hazarika, Sruthi Gorantla, Soujanya Poria, Roger Zimmermann

{"title":"多模态情感检测的自关注特征级融合","authors":"Devamanyu Hazarika, Sruthi Gorantla, Soujanya Poria, Roger Zimmermann","doi":"10.1109/MIPR.2018.00043","DOIUrl":null,"url":null,"abstract":"Multimodal emotion recognition is the task of detecting emotions present in user-generated multimedia content. Such resources contain complementary information in multiple modalities. A stiff challenge often faced is the complexity associated with feature-level fusion of these heterogeneous modes. In this paper, we propose a new feature-level fusion method based on self-attention mechanism. We also compare it with traditional fusion methods such as concatenation, outer-product, etc. Analyzed using textual and speech (audio) modalities, our results suggest that the proposed fusion method outperforms others in the context of utterance-level emotion recognition in videos.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":"{\"title\":\"Self-Attentive Feature-Level Fusion for Multimodal Emotion Detection\",\"authors\":\"Devamanyu Hazarika, Sruthi Gorantla, Soujanya Poria, Roger Zimmermann\",\"doi\":\"10.1109/MIPR.2018.00043\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multimodal emotion recognition is the task of detecting emotions present in user-generated multimedia content. Such resources contain complementary information in multiple modalities. A stiff challenge often faced is the complexity associated with feature-level fusion of these heterogeneous modes. In this paper, we propose a new feature-level fusion method based on self-attention mechanism. We also compare it with traditional fusion methods such as concatenation, outer-product, etc. Analyzed using textual and speech (audio) modalities, our results suggest that the proposed fusion method outperforms others in the context of utterance-level emotion recognition in videos.\",\"PeriodicalId\":320000,\"journal\":{\"name\":\"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"44\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MIPR.2018.00043\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MIPR.2018.00043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 44

摘要

多模态情感识别是检测用户生成的多媒体内容中存在的情感的任务。这些资源包含多种形式的补充信息。通常面临的一个严峻挑战是与这些异构模式的特征级融合相关的复杂性。本文提出了一种基于自关注机制的特征级融合方法。并与传统的拼接、外积等融合方法进行了比较。使用文本和语音(音频)模式进行分析，我们的结果表明，所提出的融合方法在视频中的话语级情感识别方面优于其他方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Self-Attentive Feature-Level Fusion for Multimodal Emotion Detection

Multimodal emotion recognition is the task of detecting emotions present in user-generated multimedia content. Such resources contain complementary information in multiple modalities. A stiff challenge often faced is the complexity associated with feature-level fusion of these heterogeneous modes. In this paper, we propose a new feature-level fusion method based on self-attention mechanism. We also compare it with traditional fusion methods such as concatenation, outer-product, etc. Analyzed using textual and speech (audio) modalities, our results suggest that the proposed fusion method outperforms others in the context of utterance-level emotion recognition in videos.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)

自引率

0.00%

发文量