{"title":"TEMM:用于多模态情感分析的文本增强型多交互关注和多任务学习网络","authors":"Bengong Yu, Zhongyu Shi","doi":"10.1007/s11227-024-06422-0","DOIUrl":null,"url":null,"abstract":"<p>Multimodal sentiment analysis is an important and active research field. Most methods construct fusion modules based on unimodal representations generated by pretrained models, which lack the deep interaction of multimodal information, especially the rich semantic-emotional information embedded in text. In addition, previous studies have focused on capturing modal coherence information and ignored differentiated information. We propose a text-enhanced multi-interactive attention and multitask learning network (TEMM). First, syntactic dependency graphs and sentiment graphs of the text are constructed, and additional graph embedding representations of the text are obtained using graph convolutional networks and graph attention networks. Then, self-attention and cross-modal attention are applied to explore intramodal and intermodal dynamic interactions, using text as the main cue. Finally, a multitask learning framework is constructed to exert control over the information flow by monitoring the mutual information between the unimodal and multimodal representations and exploiting the classification properties of the unimodal modality to achieve a more comprehensive focus on modal information. The experimental results on the CMU-MOSI, CMU-MOSEI, and CH-SIMS datasets show that the proposed model outperforms state-of-the-art models.</p>","PeriodicalId":501596,"journal":{"name":"The Journal of Supercomputing","volume":"122 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TEMM: text-enhanced multi-interactive attention and multitask learning network for multimodal sentiment analysis\",\"authors\":\"Bengong Yu, Zhongyu Shi\",\"doi\":\"10.1007/s11227-024-06422-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Multimodal sentiment analysis is an important and active research field. Most methods construct fusion modules based on unimodal representations generated by pretrained models, which lack the deep interaction of multimodal information, especially the rich semantic-emotional information embedded in text. In addition, previous studies have focused on capturing modal coherence information and ignored differentiated information. We propose a text-enhanced multi-interactive attention and multitask learning network (TEMM). First, syntactic dependency graphs and sentiment graphs of the text are constructed, and additional graph embedding representations of the text are obtained using graph convolutional networks and graph attention networks. Then, self-attention and cross-modal attention are applied to explore intramodal and intermodal dynamic interactions, using text as the main cue. Finally, a multitask learning framework is constructed to exert control over the information flow by monitoring the mutual information between the unimodal and multimodal representations and exploiting the classification properties of the unimodal modality to achieve a more comprehensive focus on modal information. The experimental results on the CMU-MOSI, CMU-MOSEI, and CH-SIMS datasets show that the proposed model outperforms state-of-the-art models.</p>\",\"PeriodicalId\":501596,\"journal\":{\"name\":\"The Journal of Supercomputing\",\"volume\":\"122 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Journal of Supercomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s11227-024-06422-0\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11227-024-06422-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
TEMM: text-enhanced multi-interactive attention and multitask learning network for multimodal sentiment analysis
Multimodal sentiment analysis is an important and active research field. Most methods construct fusion modules based on unimodal representations generated by pretrained models, which lack the deep interaction of multimodal information, especially the rich semantic-emotional information embedded in text. In addition, previous studies have focused on capturing modal coherence information and ignored differentiated information. We propose a text-enhanced multi-interactive attention and multitask learning network (TEMM). First, syntactic dependency graphs and sentiment graphs of the text are constructed, and additional graph embedding representations of the text are obtained using graph convolutional networks and graph attention networks. Then, self-attention and cross-modal attention are applied to explore intramodal and intermodal dynamic interactions, using text as the main cue. Finally, a multitask learning framework is constructed to exert control over the information flow by monitoring the mutual information between the unimodal and multimodal representations and exploiting the classification properties of the unimodal modality to achieve a more comprehensive focus on modal information. The experimental results on the CMU-MOSI, CMU-MOSEI, and CH-SIMS datasets show that the proposed model outperforms state-of-the-art models.