Emotion recognition using multimodal matchmap fusion and multi-task learning

Ricardo Pizarro, Juan Bekios-Calfa
{"title":"Emotion recognition using multimodal matchmap fusion and multi-task learning","authors":"Ricardo Pizarro, Juan Bekios-Calfa","doi":"10.1049/icp.2021.1454","DOIUrl":null,"url":null,"abstract":"Emotion recognition is a complex task due to the great intraclass and inter-class variability that exists implicitly in the problem. From the point of view of the intra-class, an emotion can be expressed by different people, which generates different representations of it. For the inter-class case, there are some kinds of emotions that are alike. Traditionally, the problem has been approached in different ways, highlighting the analysis of images to determine the facial expression of a person to extrapolate it to a type of emotion, also, the use of audio sequences to estimate the emotion of the speaker. The present work seeks to solve this problem using multimodal techniques, multitask and Deep Learning. To help with these problems, the use of a fusion method based on the similarity between audio and video modalities will be investigated and applied to the emotion classification problem. The use of this method allows the use of auxiliary tasks that enhance the learned relationships between the emotions shown in video frames and audio frames belonging to the same emotion label and punish those that are different. The results show that when using the fusion method based on the similarity of modalities together with the use of multiple tasks, the classification is improved by 7% with respect to the classification obtained in the baseline model that uses concatenation of the characteristics of each modality, the experiments are performed on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database.","PeriodicalId":431144,"journal":{"name":"11th International Conference of Pattern Recognition Systems (ICPRS 2021)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"11th International Conference of Pattern Recognition Systems (ICPRS 2021)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/icp.2021.1454","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Emotion recognition is a complex task due to the great intraclass and inter-class variability that exists implicitly in the problem. From the point of view of the intra-class, an emotion can be expressed by different people, which generates different representations of it. For the inter-class case, there are some kinds of emotions that are alike. Traditionally, the problem has been approached in different ways, highlighting the analysis of images to determine the facial expression of a person to extrapolate it to a type of emotion, also, the use of audio sequences to estimate the emotion of the speaker. The present work seeks to solve this problem using multimodal techniques, multitask and Deep Learning. To help with these problems, the use of a fusion method based on the similarity between audio and video modalities will be investigated and applied to the emotion classification problem. The use of this method allows the use of auxiliary tasks that enhance the learned relationships between the emotions shown in video frames and audio frames belonging to the same emotion label and punish those that are different. The results show that when using the fusion method based on the similarity of modalities together with the use of multiple tasks, the classification is improved by 7% with respect to the classification obtained in the baseline model that uses concatenation of the characteristics of each modality, the experiments are performed on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database.
基于多模态匹配图融合和多任务学习的情绪识别
情绪识别是一项复杂的任务,因为该问题隐含着巨大的班级内和班级间的变异性。从阶级内部的角度来看,一种情感可以由不同的人来表达,从而产生不同的表征。对于跨阶级的情况,有一些情感是相似的。传统上,这个问题已经用不同的方法来解决,强调分析图像来确定一个人的面部表情,并将其推断为一种情绪,同时,使用音频序列来估计说话者的情绪。目前的工作试图使用多模态技术、多任务和深度学习来解决这个问题。为了帮助解决这些问题,我们将研究基于音频和视频模式相似性的融合方法,并将其应用于情感分类问题。这种方法的使用允许使用辅助任务来增强属于同一情绪标签的视频帧和音频帧中显示的情绪之间的习得关系,并惩罚那些不同的情绪。结果表明,基于模态相似性的融合方法与多任务结合使用时,分类效率比基于各模态特征拼接的基线模型提高了7%,并在交互式情绪二元动作捕捉(IEMOCAP)数据库上进行了实验。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信