{"title":"Integrating Cross-modal Interactions via Latent Representation Shift for Multi-modal Humor Detection","authors":"Chengxin Chen, Pengyuan Zhang","doi":"10.1145/3551876.3554805","DOIUrl":null,"url":null,"abstract":"Multi-modal sentiment analysis has been an active research area and has attracted increasing attention from multi-disciplinary communities. However, it is still challenging to fuse the information from different modalities in an efficient way. In prior studies, the late fusion strategy has been commonly adopted due to its simplicity and efficacy. Unfortunately, it failed to model the interactions across different modalities. In this paper, we propose a transformer-based hierarchical framework to effectively model both the intrinsic semantics and cross-modal interactions of the relevant modalities. Specifically, the features from each modality are first encoded via standard transformers. Later, the cross-modal interactions from one modality to other modalities are calculated using cross-modal transformers. The derived intrinsic semantics and cross-modal interactions are used to determine the latent representation shift of a particular modality. We evaluate the proposed approach on the MuSe-Humor sub-challenge of Multi-modal Sentiment Analysis Challenge (MuSe) 2022. Experimental results show that an Area Under the Curve (AUC) of 0.9065 can be achieved on the test set of MuSe-Humor. With the promising results, our best submission ranked first place in the sub-challenge.","PeriodicalId":434392,"journal":{"name":"Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3551876.3554805","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Multi-modal sentiment analysis has been an active research area and has attracted increasing attention from multi-disciplinary communities. However, it is still challenging to fuse the information from different modalities in an efficient way. In prior studies, the late fusion strategy has been commonly adopted due to its simplicity and efficacy. Unfortunately, it failed to model the interactions across different modalities. In this paper, we propose a transformer-based hierarchical framework to effectively model both the intrinsic semantics and cross-modal interactions of the relevant modalities. Specifically, the features from each modality are first encoded via standard transformers. Later, the cross-modal interactions from one modality to other modalities are calculated using cross-modal transformers. The derived intrinsic semantics and cross-modal interactions are used to determine the latent representation shift of a particular modality. We evaluate the proposed approach on the MuSe-Humor sub-challenge of Multi-modal Sentiment Analysis Challenge (MuSe) 2022. Experimental results show that an Area Under the Curve (AUC) of 0.9065 can be achieved on the test set of MuSe-Humor. With the promising results, our best submission ranked first place in the sub-challenge.
多模态情感分析是一个活跃的研究领域,越来越受到多学科的关注。然而,如何有效地融合来自不同模式的信息仍然是一个挑战。在以往的研究中,后期融合策略因其简单有效而被普遍采用。不幸的是,它未能模拟不同模式之间的相互作用。在本文中,我们提出了一个基于转换器的分层框架来有效地建模固有语义和相关模态的跨模态交互。具体来说,每个模态的特征首先通过标准转换器进行编码。然后,使用跨模态变压器计算从一个模态到其他模态的跨模态相互作用。推导出的内在语义和跨模态相互作用用于确定特定模态的潜在表征转移。我们在2022年多模态情感分析挑战赛(MuSe)的MuSe- humor子挑战中评估了该方法。实验结果表明,在MuSe-Humor的测试集上,可以实现0.9065的曲线下面积(Area Under the Curve, AUC)。凭借令人鼓舞的结果,我们的最佳作品在子挑战赛中获得了第一名。