{"title":"Information Bottleneck based Representation Learning for Multimodal Sentiment Analysis","authors":"Tonghui Zhang, Haiying Zhang, Shuke Xiang, Tong Wu","doi":"10.1145/3522749.3523069","DOIUrl":null,"url":null,"abstract":"Recently, Multimodal Sentiment Analysis (MSA) has become a hot research topic of cross modal research in artificial intelligence domain. For this task, the research focuses on extract comprehensive information which dispersed in different modalities. In existing research works, some paid attention to the ingenious fusion method inspired by the consideration of intra-modality and inter-modality reaction, while others devoted to remove task-irrelevant information to refine single modal representation by imposing constraints. However, both of these are limited to the lack of effective control over information in the learning of multimodal representation. It may loss task-relevant information or introduce extra noise. In order to address the afore-mentioned issue, we propose a framework named Multimodal Information Bottleneck (MMIB) in this paper. By imposing mutual information constraints between different modal pairs (text-visual, acoustic-visual, text-acoustic) to control the maximization of mutual information between different modalities and minimization of mutual information inside single modalities, the task-irrelevant information in a single modal can be removed efficiency while kept the related ones, so that the multimodal representation is improved greatly. By the experiments on two widely used public datasets, it demonstrates that our proposed method outperforms existing methods (like MAG-BERT, Self-MM) in binary-classification and achieves a comparable performance in other evaluation metrics.","PeriodicalId":361473,"journal":{"name":"Proceedings of the 6th International Conference on Control Engineering and Artificial Intelligence","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Control Engineering and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3522749.3523069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, Multimodal Sentiment Analysis (MSA) has become a hot research topic of cross modal research in artificial intelligence domain. For this task, the research focuses on extract comprehensive information which dispersed in different modalities. In existing research works, some paid attention to the ingenious fusion method inspired by the consideration of intra-modality and inter-modality reaction, while others devoted to remove task-irrelevant information to refine single modal representation by imposing constraints. However, both of these are limited to the lack of effective control over information in the learning of multimodal representation. It may loss task-relevant information or introduce extra noise. In order to address the afore-mentioned issue, we propose a framework named Multimodal Information Bottleneck (MMIB) in this paper. By imposing mutual information constraints between different modal pairs (text-visual, acoustic-visual, text-acoustic) to control the maximization of mutual information between different modalities and minimization of mutual information inside single modalities, the task-irrelevant information in a single modal can be removed efficiency while kept the related ones, so that the multimodal representation is improved greatly. By the experiments on two widely used public datasets, it demonstrates that our proposed method outperforms existing methods (like MAG-BERT, Self-MM) in binary-classification and achieves a comparable performance in other evaluation metrics.