情感知识辅助双向学习的多模态情感分析

IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xuefeng Shi , Ming Yang , Min Hu , Fuji Ren , Xin Kang , Weiping Ding
{"title":"情感知识辅助双向学习的多模态情感分析","authors":"Xuefeng Shi ,&nbsp;Ming Yang ,&nbsp;Min Hu ,&nbsp;Fuji Ren ,&nbsp;Xin Kang ,&nbsp;Weiping Ding","doi":"10.1016/j.csl.2024.101755","DOIUrl":null,"url":null,"abstract":"<div><div>As a fine-grained task in the community of Multi-modal Sentiment Analysis (MSA), Multi-modal Aspect-based Sentiment Analysis (MABSA) is challenging and has attracted numerous researchers’ attention, and prominent progress has been achieved in recent years. However, there is still a lack of effective strategies for feature alignment between different modalities, and further exploration is urgently needed. Thus, this paper proposed a novel MABSA method to enhance the sentiment feature alignment, namely Affective Knowledge-Assisted Bi-directional Learning (AKABL) networks, which learn sentiment information from different modalities through multiple perspectives. Specifically, AKABL gains the textual semantic and syntactic features through encoding text modality via pre-trained language model BERT and Syntax Parser SpaCy, respectively. And then, to strengthen the expression of sentiment information in the syntactic graph, affective knowledge SenticNet is introduced to assist AKABL in comprehending textual sentiment information. On the other side, to leverage image modality efficiently, the pre-trained model Visual Transformer (ViT) is employed to extract the necessary image features. Additionally, to integrate the obtained features, this paper utilizes the module Single Modality GCN (SMGCN) to achieve the joint textual semantic and syntactic representation. And to bridge the textual and image features, the module Double Modalities GCN (DMGCN) is devised and applied to extract the sentiment information from different modalities simultaneously. Besides, to bridge the alignment gap between text and image features, this paper devises a novel alignment strategy to build the relationship between these two representations, which measures that difference with the Jensen–Shannon divergence from bi-directional perspectives. It is worth noting that cross-attention and cosine distance-based similarity are also applied in the proposed AKABL. To validate the effectiveness of the proposed method, extensive experiments are conducted on two widely used and public benchmark datasets, and the experimental results demonstrate that AKABL can improve the tasks’ performance obviously, which outperforms the optimal baseline with accuracy improvement of 0.47% and 0.72% on the two datasets.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"91 ","pages":"Article 101755"},"PeriodicalIF":3.4000,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Affective knowledge assisted bi-directional learning for Multi-modal Aspect-based Sentiment Analysis\",\"authors\":\"Xuefeng Shi ,&nbsp;Ming Yang ,&nbsp;Min Hu ,&nbsp;Fuji Ren ,&nbsp;Xin Kang ,&nbsp;Weiping Ding\",\"doi\":\"10.1016/j.csl.2024.101755\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>As a fine-grained task in the community of Multi-modal Sentiment Analysis (MSA), Multi-modal Aspect-based Sentiment Analysis (MABSA) is challenging and has attracted numerous researchers’ attention, and prominent progress has been achieved in recent years. However, there is still a lack of effective strategies for feature alignment between different modalities, and further exploration is urgently needed. Thus, this paper proposed a novel MABSA method to enhance the sentiment feature alignment, namely Affective Knowledge-Assisted Bi-directional Learning (AKABL) networks, which learn sentiment information from different modalities through multiple perspectives. Specifically, AKABL gains the textual semantic and syntactic features through encoding text modality via pre-trained language model BERT and Syntax Parser SpaCy, respectively. And then, to strengthen the expression of sentiment information in the syntactic graph, affective knowledge SenticNet is introduced to assist AKABL in comprehending textual sentiment information. On the other side, to leverage image modality efficiently, the pre-trained model Visual Transformer (ViT) is employed to extract the necessary image features. Additionally, to integrate the obtained features, this paper utilizes the module Single Modality GCN (SMGCN) to achieve the joint textual semantic and syntactic representation. And to bridge the textual and image features, the module Double Modalities GCN (DMGCN) is devised and applied to extract the sentiment information from different modalities simultaneously. Besides, to bridge the alignment gap between text and image features, this paper devises a novel alignment strategy to build the relationship between these two representations, which measures that difference with the Jensen–Shannon divergence from bi-directional perspectives. It is worth noting that cross-attention and cosine distance-based similarity are also applied in the proposed AKABL. To validate the effectiveness of the proposed method, extensive experiments are conducted on two widely used and public benchmark datasets, and the experimental results demonstrate that AKABL can improve the tasks’ performance obviously, which outperforms the optimal baseline with accuracy improvement of 0.47% and 0.72% on the two datasets.</div></div>\",\"PeriodicalId\":50638,\"journal\":{\"name\":\"Computer Speech and Language\",\"volume\":\"91 \",\"pages\":\"Article 101755\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Speech and Language\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0885230824001372\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824001372","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

作为多模态情感分析(MSA)领域的一项细粒度任务,基于多模态方面的情感分析(MABSA)具有挑战性,引起了众多研究者的关注,近年来取得了显著进展。然而,不同模态之间的特征对齐仍然缺乏有效的策略,亟待进一步探索。为此,本文提出了一种新的MABSA方法来增强情感特征对齐,即情感知识辅助双向学习(AKABL)网络,该网络通过多个角度从不同的方式学习情感信息。具体来说,AKABL分别通过预训练的语言模型BERT和语法分析器SpaCy对文本情态进行编码,从而获得文本的语义和句法特征。然后,为了加强情感信息在句法图中的表达,引入情感知识SenticNet来辅助AKABL理解文本情感信息。另一方面,为了有效地利用图像模态,采用预训练模型视觉变换(Visual Transformer, ViT)提取必要的图像特征。此外,为了整合得到的特征,本文利用SMGCN模块实现了文本语义和句法的联合表示。为了桥接文本和图像特征,设计了双模态GCN (DMGCN)模块,用于同时提取不同模态的情感信息。此外,为了弥补文本和图像特征之间的对齐差距,本文设计了一种新的对齐策略来建立这两种表征之间的关系,并从双向角度测量了Jensen-Shannon分歧的差异。值得注意的是,交叉注意和基于余弦距离的相似性也应用于所提出的AKABL中。为了验证该方法的有效性,在两个广泛使用的公共基准数据集上进行了大量的实验,实验结果表明,AKABL可以明显提高任务的性能,在两个数据集上的准确率分别提高了0.47%和0.72%,优于最优基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Affective knowledge assisted bi-directional learning for Multi-modal Aspect-based Sentiment Analysis

Affective knowledge assisted bi-directional learning for Multi-modal Aspect-based Sentiment Analysis
As a fine-grained task in the community of Multi-modal Sentiment Analysis (MSA), Multi-modal Aspect-based Sentiment Analysis (MABSA) is challenging and has attracted numerous researchers’ attention, and prominent progress has been achieved in recent years. However, there is still a lack of effective strategies for feature alignment between different modalities, and further exploration is urgently needed. Thus, this paper proposed a novel MABSA method to enhance the sentiment feature alignment, namely Affective Knowledge-Assisted Bi-directional Learning (AKABL) networks, which learn sentiment information from different modalities through multiple perspectives. Specifically, AKABL gains the textual semantic and syntactic features through encoding text modality via pre-trained language model BERT and Syntax Parser SpaCy, respectively. And then, to strengthen the expression of sentiment information in the syntactic graph, affective knowledge SenticNet is introduced to assist AKABL in comprehending textual sentiment information. On the other side, to leverage image modality efficiently, the pre-trained model Visual Transformer (ViT) is employed to extract the necessary image features. Additionally, to integrate the obtained features, this paper utilizes the module Single Modality GCN (SMGCN) to achieve the joint textual semantic and syntactic representation. And to bridge the textual and image features, the module Double Modalities GCN (DMGCN) is devised and applied to extract the sentiment information from different modalities simultaneously. Besides, to bridge the alignment gap between text and image features, this paper devises a novel alignment strategy to build the relationship between these two representations, which measures that difference with the Jensen–Shannon divergence from bi-directional perspectives. It is worth noting that cross-attention and cosine distance-based similarity are also applied in the proposed AKABL. To validate the effectiveness of the proposed method, extensive experiments are conducted on two widely used and public benchmark datasets, and the experimental results demonstrate that AKABL can improve the tasks’ performance obviously, which outperforms the optimal baseline with accuracy improvement of 0.47% and 0.72% on the two datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computer Speech and Language
Computer Speech and Language 工程技术-计算机:人工智能
CiteScore
11.30
自引率
4.70%
发文量
80
审稿时长
22.9 weeks
期刊介绍: Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信