Xuefeng Shi , Ming Yang , Min Hu , Fuji Ren , Xin Kang , Weiping Ding
{"title":"Affective knowledge assisted bi-directional learning for Multi-modal Aspect-based Sentiment Analysis","authors":"Xuefeng Shi , Ming Yang , Min Hu , Fuji Ren , Xin Kang , Weiping Ding","doi":"10.1016/j.csl.2024.101755","DOIUrl":null,"url":null,"abstract":"<div><div>As a fine-grained task in the community of Multi-modal Sentiment Analysis (MSA), Multi-modal Aspect-based Sentiment Analysis (MABSA) is challenging and has attracted numerous researchers’ attention, and prominent progress has been achieved in recent years. However, there is still a lack of effective strategies for feature alignment between different modalities, and further exploration is urgently needed. Thus, this paper proposed a novel MABSA method to enhance the sentiment feature alignment, namely Affective Knowledge-Assisted Bi-directional Learning (AKABL) networks, which learn sentiment information from different modalities through multiple perspectives. Specifically, AKABL gains the textual semantic and syntactic features through encoding text modality via pre-trained language model BERT and Syntax Parser SpaCy, respectively. And then, to strengthen the expression of sentiment information in the syntactic graph, affective knowledge SenticNet is introduced to assist AKABL in comprehending textual sentiment information. On the other side, to leverage image modality efficiently, the pre-trained model Visual Transformer (ViT) is employed to extract the necessary image features. Additionally, to integrate the obtained features, this paper utilizes the module Single Modality GCN (SMGCN) to achieve the joint textual semantic and syntactic representation. And to bridge the textual and image features, the module Double Modalities GCN (DMGCN) is devised and applied to extract the sentiment information from different modalities simultaneously. Besides, to bridge the alignment gap between text and image features, this paper devises a novel alignment strategy to build the relationship between these two representations, which measures that difference with the Jensen–Shannon divergence from bi-directional perspectives. It is worth noting that cross-attention and cosine distance-based similarity are also applied in the proposed AKABL. To validate the effectiveness of the proposed method, extensive experiments are conducted on two widely used and public benchmark datasets, and the experimental results demonstrate that AKABL can improve the tasks’ performance obviously, which outperforms the optimal baseline with accuracy improvement of 0.47% and 0.72% on the two datasets.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"91 ","pages":"Article 101755"},"PeriodicalIF":3.1000,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824001372","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
As a fine-grained task in the community of Multi-modal Sentiment Analysis (MSA), Multi-modal Aspect-based Sentiment Analysis (MABSA) is challenging and has attracted numerous researchers’ attention, and prominent progress has been achieved in recent years. However, there is still a lack of effective strategies for feature alignment between different modalities, and further exploration is urgently needed. Thus, this paper proposed a novel MABSA method to enhance the sentiment feature alignment, namely Affective Knowledge-Assisted Bi-directional Learning (AKABL) networks, which learn sentiment information from different modalities through multiple perspectives. Specifically, AKABL gains the textual semantic and syntactic features through encoding text modality via pre-trained language model BERT and Syntax Parser SpaCy, respectively. And then, to strengthen the expression of sentiment information in the syntactic graph, affective knowledge SenticNet is introduced to assist AKABL in comprehending textual sentiment information. On the other side, to leverage image modality efficiently, the pre-trained model Visual Transformer (ViT) is employed to extract the necessary image features. Additionally, to integrate the obtained features, this paper utilizes the module Single Modality GCN (SMGCN) to achieve the joint textual semantic and syntactic representation. And to bridge the textual and image features, the module Double Modalities GCN (DMGCN) is devised and applied to extract the sentiment information from different modalities simultaneously. Besides, to bridge the alignment gap between text and image features, this paper devises a novel alignment strategy to build the relationship between these two representations, which measures that difference with the Jensen–Shannon divergence from bi-directional perspectives. It is worth noting that cross-attention and cosine distance-based similarity are also applied in the proposed AKABL. To validate the effectiveness of the proposed method, extensive experiments are conducted on two widely used and public benchmark datasets, and the experimental results demonstrate that AKABL can improve the tasks’ performance obviously, which outperforms the optimal baseline with accuracy improvement of 0.47% and 0.72% on the two datasets.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.