A guard against ambiguous sentiment for multimodal aspect-level sentiment classification

IF 6.9 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Processing & Management Pub Date : 2025-09-17 DOI:10.1016/j.ipm.2025.104375

Yanjing Wang , Kai Sun , Bin Shi , Hao Wu , Kaihao Zhang , Bo Dong

{"title":"A guard against ambiguous sentiment for multimodal aspect-level sentiment classification","authors":"Yanjing Wang , Kai Sun , Bin Shi , Hao Wu , Kaihao Zhang , Bo Dong","doi":"10.1016/j.ipm.2025.104375","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advances in multimodal learning have achieved state-of-the-art results in aspect-level sentiment classification by leveraging both text and image data. However, images can sometimes contain contradictory sentiment cues or convey complex messages, making it difficult to accurately determine the sentiment expressed in the text. Intuitively, we should only use image data to complement the text if the latter contains ambiguous sentiment or leans toward the neutral polarity. Therefore, instead of trying to forcefully use images as done in prior work, we develop a Guard against Ambiguous Sentiment (GAS) for multimodal aspect-level sentiment classification (MALSC). Built on a pretrained language model, GAS is equipped with a novel “ambiguity learning” strategy that focuses on learning the degree of sentiment ambiguity within the input text. The sentiment ambiguity then serves to determine the extent to which image information should be utilized for accurate sentiment classification. In our experiments with two benchmark twitter datasets, we found that GAS achieves a performance gain of up to 0.98% in macro-F1 score compared to recent methods in the task. Furthermore, we explore the efficacy of large language models (LLMs) in the MALSC task by employing the core ideas behind GAS to design tailored prompts. We show that multimodal LLMs such as LLaVA, when provided with GAS-principled prompts, yields a 2.4% improvement in macro-F1 score for few-shot learning on the MALSC task.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104375"},"PeriodicalIF":6.9000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325003164","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advances in multimodal learning have achieved state-of-the-art results in aspect-level sentiment classification by leveraging both text and image data. However, images can sometimes contain contradictory sentiment cues or convey complex messages, making it difficult to accurately determine the sentiment expressed in the text. Intuitively, we should only use image data to complement the text if the latter contains ambiguous sentiment or leans toward the neutral polarity. Therefore, instead of trying to forcefully use images as done in prior work, we develop a Guard against Ambiguous Sentiment (GAS) for multimodal aspect-level sentiment classification (MALSC). Built on a pretrained language model, GAS is equipped with a novel “ambiguity learning” strategy that focuses on learning the degree of sentiment ambiguity within the input text. The sentiment ambiguity then serves to determine the extent to which image information should be utilized for accurate sentiment classification. In our experiments with two benchmark twitter datasets, we found that GAS achieves a performance gain of up to 0.98% in macro-F1 score compared to recent methods in the task. Furthermore, we explore the efficacy of large language models (LLMs) in the MALSC task by employing the core ideas behind GAS to design tailored prompts. We show that multimodal LLMs such as LLaVA, when provided with GAS-principled prompts, yields a 2.4% improvement in macro-F1 score for few-shot learning on the MALSC task.

查看原文本刊更多论文

多模态方面级情感分类中对歧义情感的防范

多模态学习的最新进展通过利用文本和图像数据在方面级情感分类方面取得了最先进的结果。然而，图像有时会包含矛盾的情绪线索或传达复杂的信息，因此很难准确确定文本中表达的情绪。直观地说，我们应该只使用图像数据来补充文本，如果后者包含模棱两可的情绪或倾向于中性极性。因此，我们没有像之前的工作那样试图强行使用图像，而是开发了一种用于多模态方面级情感分类（MALSC）的防止模糊情感（GAS）。GAS基于预训练的语言模型，配备了一种新颖的“歧义学习”策略，专注于学习输入文本中情感歧义的程度。然后，情感歧义用于确定图像信息应用于准确情感分类的程度。在我们对两个基准twitter数据集的实验中，我们发现与最近的任务方法相比，GAS在宏观f1得分方面的性能增益高达0.98%。此外，我们通过采用GAS背后的核心思想来设计量身定制的提示，探索大型语言模型（llm）在MALSC任务中的功效。我们发现，当提供gas原则提示时，像LLaVA这样的多模态llm在MALSC任务的少量学习中产生了2.4%的宏观f1分数提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.