{"title":"A Multi-View Co-Learning Method for Multimodal Sentiment Analysis","authors":"Wenxiu Geng, Yulong Bian, Xiangxian Li","doi":"10.1109/ICME55011.2023.00238","DOIUrl":null,"url":null,"abstract":"Existing works on multimodal sentiment analysis have focused on learning more discriminative unimodal sentiment information or improving multimodal fusion methods to enhance modal complementarity. However, practical results of these methods have been limited owing to the problems of insufficient intra-modal representation and inter-modal noise. To alleviate this problem, we propose a multi-view co-learning method (MVATF) for video sentiment analysis. First, we propose a multi-view features extraction module to capture more perspectives from a single modality. Second, we propose a two-level fusion sentiment enhancement strategy that uses hierarchical attentive learning fusion and a multi-task learning fusion module to achieve co-learning to effectively filter inter-modal noise for better multimodal sentiment fusion features. Experimental results on the CH-SIMS, CMU-MOSI and MOSEI datasets show that the proposed method outperforms the state-of-the-art methods.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME55011.2023.00238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Existing works on multimodal sentiment analysis have focused on learning more discriminative unimodal sentiment information or improving multimodal fusion methods to enhance modal complementarity. However, practical results of these methods have been limited owing to the problems of insufficient intra-modal representation and inter-modal noise. To alleviate this problem, we propose a multi-view co-learning method (MVATF) for video sentiment analysis. First, we propose a multi-view features extraction module to capture more perspectives from a single modality. Second, we propose a two-level fusion sentiment enhancement strategy that uses hierarchical attentive learning fusion and a multi-task learning fusion module to achieve co-learning to effectively filter inter-modal noise for better multimodal sentiment fusion features. Experimental results on the CH-SIMS, CMU-MOSI and MOSEI datasets show that the proposed method outperforms the state-of-the-art methods.