Baojian Ren;Tao Cao;Zhengyang Zhang;Shuchen Bai;Na Liu
{"title":"Hierarchical Signal Calibration and Refinement for Multimodal Sentiment Analysis","authors":"Baojian Ren;Tao Cao;Zhengyang Zhang;Shuchen Bai;Na Liu","doi":"10.1109/LSP.2025.3603884","DOIUrl":null,"url":null,"abstract":"To address the issues of noise amplification and feature incompatibility arising from modal heterogeneity in multimodal sentiment analysis, this paper proposes a hierarchical optimization framework. In the first stage, we introduce the Semantic-Guided Calibration Network (SGC-Net), which, through a Dynamic Balancing Regulator (DBR), leverages textual semantics to intelligently weight and calibrate the cross-modal interactions of audio and video, thereby suppressing noise while preserving key dynamics. In the second stage, the Synergistic Refinement Fusion Module (SRF-Module) performs a deep refinement of the fused multi-source features. This module employs a Saliency-Gated Complementor (SGC) to rigorously filter and exchange effective information across streams, ultimately achieving feature de-redundancy and strong complementarity. Extensive experiments on the CMU-MOSI and CMU-MOSEI datasets validate the effectiveness of our method, with the model achieving state-of-the-art performance on key metrics such as binary accuracy (Acc-2: 86.73% on MOSI, 86.52% on MOSEI) and seven-class accuracy (Acc-7: 48.35% on MOSI, 53.81% on MOSEI).","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3450-3454"},"PeriodicalIF":3.9000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11143888/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
To address the issues of noise amplification and feature incompatibility arising from modal heterogeneity in multimodal sentiment analysis, this paper proposes a hierarchical optimization framework. In the first stage, we introduce the Semantic-Guided Calibration Network (SGC-Net), which, through a Dynamic Balancing Regulator (DBR), leverages textual semantics to intelligently weight and calibrate the cross-modal interactions of audio and video, thereby suppressing noise while preserving key dynamics. In the second stage, the Synergistic Refinement Fusion Module (SRF-Module) performs a deep refinement of the fused multi-source features. This module employs a Saliency-Gated Complementor (SGC) to rigorously filter and exchange effective information across streams, ultimately achieving feature de-redundancy and strong complementarity. Extensive experiments on the CMU-MOSI and CMU-MOSEI datasets validate the effectiveness of our method, with the model achieving state-of-the-art performance on key metrics such as binary accuracy (Acc-2: 86.73% on MOSI, 86.52% on MOSEI) and seven-class accuracy (Acc-7: 48.35% on MOSI, 53.81% on MOSEI).
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.