Xiaoyong Huang, Heli Sun, Xuechun Liu, Jiaruo Wu, Liang He
{"title":"Text-Dominant Speech-Enhanced for Multimodal Aspect-Based Sentiment Analysis network","authors":"Xiaoyong Huang, Heli Sun, Xuechun Liu, Jiaruo Wu, Liang He","doi":"10.1016/j.inffus.2025.103543","DOIUrl":null,"url":null,"abstract":"<div><div>Existing Multimodal Aspect-Based Sentiment Analysis techniques primarily focus on associating visual and textual content but often overlook the critical issue of visual expression deficiency, where images fail to provide complete aspect terms and sufficient sentiment signals. To address this limitation, we propose a Multimodal Aspect-Based Sentiment Analysis network that leverages <strong>T</strong>ext-<strong>D</strong>ominant <strong>S</strong>peech <strong>E</strong>nhancement (TDSEN), aiming to alleviate the deficiency in visual expression by synthesizing speech and employing a text-dominant approach. Specifically, we introduce a Text-Driven Speech Enhancement Layer that generates speech with stable timbre to identify all aspect terms, compensate for the lacking parts of visual expression, and provide additional aspect term information and emotional cues. Meanwhile, we design a semantic distance mask matrix to enhance the capability of capturing key information from the textual modality. Furthermore, a text-driven multimodal feature fusion module is incorporated to strengthen the dominant role of text and facilitate multimodal feature interaction and integration for the extraction of the term of the aspect and sentiment recognition. Comprehensive evaluations on the Twitter-2015 and Twitter-2017 benchmarks demonstrate TDSEN’s superiority, achieving absolute improvements of 2.6% and 1.7% over state-of-the-art baselines, with ablation studies confirming the necessity of each component.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"126 ","pages":"Article 103543"},"PeriodicalIF":15.5000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525006153","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Existing Multimodal Aspect-Based Sentiment Analysis techniques primarily focus on associating visual and textual content but often overlook the critical issue of visual expression deficiency, where images fail to provide complete aspect terms and sufficient sentiment signals. To address this limitation, we propose a Multimodal Aspect-Based Sentiment Analysis network that leverages Text-Dominant Speech Enhancement (TDSEN), aiming to alleviate the deficiency in visual expression by synthesizing speech and employing a text-dominant approach. Specifically, we introduce a Text-Driven Speech Enhancement Layer that generates speech with stable timbre to identify all aspect terms, compensate for the lacking parts of visual expression, and provide additional aspect term information and emotional cues. Meanwhile, we design a semantic distance mask matrix to enhance the capability of capturing key information from the textual modality. Furthermore, a text-driven multimodal feature fusion module is incorporated to strengthen the dominant role of text and facilitate multimodal feature interaction and integration for the extraction of the term of the aspect and sentiment recognition. Comprehensive evaluations on the Twitter-2015 and Twitter-2017 benchmarks demonstrate TDSEN’s superiority, achieving absolute improvements of 2.6% and 1.7% over state-of-the-art baselines, with ablation studies confirming the necessity of each component.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.