TEMM: text-enhanced multi-interactive attention and multitask learning network for multimodal sentiment analysis

The Journal of Supercomputing Pub Date : 2024-08-12 DOI:10.1007/s11227-024-06422-0

Bengong Yu, Zhongyu Shi

{"title":"TEMM: text-enhanced multi-interactive attention and multitask learning network for multimodal sentiment analysis","authors":"Bengong Yu, Zhongyu Shi","doi":"10.1007/s11227-024-06422-0","DOIUrl":null,"url":null,"abstract":"<p>Multimodal sentiment analysis is an important and active research field. Most methods construct fusion modules based on unimodal representations generated by pretrained models, which lack the deep interaction of multimodal information, especially the rich semantic-emotional information embedded in text. In addition, previous studies have focused on capturing modal coherence information and ignored differentiated information. We propose a text-enhanced multi-interactive attention and multitask learning network (TEMM). First, syntactic dependency graphs and sentiment graphs of the text are constructed, and additional graph embedding representations of the text are obtained using graph convolutional networks and graph attention networks. Then, self-attention and cross-modal attention are applied to explore intramodal and intermodal dynamic interactions, using text as the main cue. Finally, a multitask learning framework is constructed to exert control over the information flow by monitoring the mutual information between the unimodal and multimodal representations and exploiting the classification properties of the unimodal modality to achieve a more comprehensive focus on modal information. The experimental results on the CMU-MOSI, CMU-MOSEI, and CH-SIMS datasets show that the proposed model outperforms state-of-the-art models.</p>","PeriodicalId":501596,"journal":{"name":"The Journal of Supercomputing","volume":"122 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11227-024-06422-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Multimodal sentiment analysis is an important and active research field. Most methods construct fusion modules based on unimodal representations generated by pretrained models, which lack the deep interaction of multimodal information, especially the rich semantic-emotional information embedded in text. In addition, previous studies have focused on capturing modal coherence information and ignored differentiated information. We propose a text-enhanced multi-interactive attention and multitask learning network (TEMM). First, syntactic dependency graphs and sentiment graphs of the text are constructed, and additional graph embedding representations of the text are obtained using graph convolutional networks and graph attention networks. Then, self-attention and cross-modal attention are applied to explore intramodal and intermodal dynamic interactions, using text as the main cue. Finally, a multitask learning framework is constructed to exert control over the information flow by monitoring the mutual information between the unimodal and multimodal representations and exploiting the classification properties of the unimodal modality to achieve a more comprehensive focus on modal information. The experimental results on the CMU-MOSI, CMU-MOSEI, and CH-SIMS datasets show that the proposed model outperforms state-of-the-art models.

Abstract Image

查看原文本刊更多论文

TEMM：用于多模态情感分析的文本增强型多交互关注和多任务学习网络

多模态情感分析是一个重要而活跃的研究领域。大多数方法都是基于预训练模型生成的单模态表征构建融合模块，缺乏多模态信息的深度交互，尤其是文本中蕴含的丰富语义情感信息。此外，以往的研究侧重于捕捉模态一致性信息，忽略了差异化信息。我们提出了一种文本增强型多交互注意和多任务学习网络（TEMM）。首先，我们构建了文本的句法依赖图和情感图，并利用图卷积网络和图注意力网络获得了文本的附加图嵌入表示。然后，以文本为主要线索，应用自我注意和跨模态注意来探索模态内和模态间的动态交互。最后，我们构建了一个多任务学习框架，通过监测单模态和多模态表征之间的互信息来控制信息流，并利用单模态的分类特性来实现对模态信息更全面的关注。在 CMU-MOSI、CMU-MOSEI 和 CH-SIMS 数据集上的实验结果表明，所提出的模型优于最先进的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Journal of Supercomputing

自引率

0.00%

发文量