{"title":"多注意多模态情感分析","authors":"Taeyong Kim, Bowon Lee","doi":"10.1145/3372278.3390698","DOIUrl":null,"url":null,"abstract":"Sentiment analysis plays an important role in natural-language processing. It has been performed on multimodal data including text, audio, and video. Previously conducted research does not make full utilization of such heterogeneous data. In this study, we propose a model of Multi-Attention Recurrent Neural Network (MA-RNN) for performing sentiment analysis on multimodal data. The proposed network consists of two attention layers and a Bidirectional Gated Recurrent Neural Network (BiGRU). The first attention layer is used for data fusion and dimensionality reduction, and the second attention layer is used for the augmentation of BiGRU to capture key parts of the contextual information among utterances. Experiments on multimodal sentiment analysis indicate that our proposed model achieves the state-of-the-art performance of 84.31% accuracy on the Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis (CMU-MOSI) dataset. Furthermore, an ablation study is conducted to evaluate the contributions of different components of the network. We believe that our findings of this study may also offer helpful insights into the design of models using multimodal data.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"138 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Multi-Attention Multimodal Sentiment Analysis\",\"authors\":\"Taeyong Kim, Bowon Lee\",\"doi\":\"10.1145/3372278.3390698\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment analysis plays an important role in natural-language processing. It has been performed on multimodal data including text, audio, and video. Previously conducted research does not make full utilization of such heterogeneous data. In this study, we propose a model of Multi-Attention Recurrent Neural Network (MA-RNN) for performing sentiment analysis on multimodal data. The proposed network consists of two attention layers and a Bidirectional Gated Recurrent Neural Network (BiGRU). The first attention layer is used for data fusion and dimensionality reduction, and the second attention layer is used for the augmentation of BiGRU to capture key parts of the contextual information among utterances. Experiments on multimodal sentiment analysis indicate that our proposed model achieves the state-of-the-art performance of 84.31% accuracy on the Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis (CMU-MOSI) dataset. Furthermore, an ablation study is conducted to evaluate the contributions of different components of the network. We believe that our findings of this study may also offer helpful insights into the design of models using multimodal data.\",\"PeriodicalId\":158014,\"journal\":{\"name\":\"Proceedings of the 2020 International Conference on Multimedia Retrieval\",\"volume\":\"138 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 International Conference on Multimedia Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3372278.3390698\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3372278.3390698","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sentiment analysis plays an important role in natural-language processing. It has been performed on multimodal data including text, audio, and video. Previously conducted research does not make full utilization of such heterogeneous data. In this study, we propose a model of Multi-Attention Recurrent Neural Network (MA-RNN) for performing sentiment analysis on multimodal data. The proposed network consists of two attention layers and a Bidirectional Gated Recurrent Neural Network (BiGRU). The first attention layer is used for data fusion and dimensionality reduction, and the second attention layer is used for the augmentation of BiGRU to capture key parts of the contextual information among utterances. Experiments on multimodal sentiment analysis indicate that our proposed model achieves the state-of-the-art performance of 84.31% accuracy on the Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis (CMU-MOSI) dataset. Furthermore, an ablation study is conducted to evaluate the contributions of different components of the network. We believe that our findings of this study may also offer helpful insights into the design of models using multimodal data.