多模态情感分析的两阶段自适应融合网络

IF 3.4 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Intelligence Pub Date : 2025-05-07 DOI:10.1007/s10489-025-06577-0

Jiaqi Liu, Yong Wang, Jing Yang, Xu Yu, Meng Zhao

{"title":"多模态情感分析的两阶段自适应融合网络","authors":"Jiaqi Liu, Yong Wang, Jing Yang, Xu Yu, Meng Zhao","doi":"10.1007/s10489-025-06577-0","DOIUrl":null,"url":null,"abstract":"<div><p>Multimodal sentiment analysis (MSA) provides a more accurate understanding of human emotional states than unimodal. However, the different modalities are limited by semantic expression in expressing emotion, leading to inconsistency in the importance of unimodal influence on the fused modal sentiment polarity, as well as sentiment polarity biases resulting from the interaction between multiple modalities. This can make MSA less accurate. To address this problem, we propose a two-stage adaptive fusion network (TsAFN) in this paper. The first stage is an adaptive fusion network based on the joint of modal features. Feature extraction is based on Bert and LSTM network. An importance metric adaptive benchmark is presented for proposing a feature planning method to jointly represent multimodal features to form fused modal features, which automatically equalizes the importance of unimodal influence on the fused modal sentiment polarity. The second stage is an adaptive fusion network based on modal interaction. A distance metric adaptive benchmark is defined, based on which a representation reconstruction method is proposed to take into account inter-modal interactions. The relationship and sentiment polarity biases of the modalities are adjusted to reconstruct unimodal sentiment polarity and a more accurate representation of the fused modality. Finally, the loss function is defined and the model is trained on three datasets MOSI, MOSEI, and CH-SIMS. The results of comparative experiments show that TsAFN can achieve better accuracy in MSA.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 10","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TsAFN: A two-stage adaptive fusion network for multimodal sentiment analysis\",\"authors\":\"Jiaqi Liu, Yong Wang, Jing Yang, Xu Yu, Meng Zhao\",\"doi\":\"10.1007/s10489-025-06577-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Multimodal sentiment analysis (MSA) provides a more accurate understanding of human emotional states than unimodal. However, the different modalities are limited by semantic expression in expressing emotion, leading to inconsistency in the importance of unimodal influence on the fused modal sentiment polarity, as well as sentiment polarity biases resulting from the interaction between multiple modalities. This can make MSA less accurate. To address this problem, we propose a two-stage adaptive fusion network (TsAFN) in this paper. The first stage is an adaptive fusion network based on the joint of modal features. Feature extraction is based on Bert and LSTM network. An importance metric adaptive benchmark is presented for proposing a feature planning method to jointly represent multimodal features to form fused modal features, which automatically equalizes the importance of unimodal influence on the fused modal sentiment polarity. The second stage is an adaptive fusion network based on modal interaction. A distance metric adaptive benchmark is defined, based on which a representation reconstruction method is proposed to take into account inter-modal interactions. The relationship and sentiment polarity biases of the modalities are adjusted to reconstruct unimodal sentiment polarity and a more accurate representation of the fused modality. Finally, the loss function is defined and the model is trained on three datasets MOSI, MOSEI, and CH-SIMS. The results of comparative experiments show that TsAFN can achieve better accuracy in MSA.</p></div>\",\"PeriodicalId\":8041,\"journal\":{\"name\":\"Applied Intelligence\",\"volume\":\"55 10\",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-05-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10489-025-06577-0\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06577-0","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

多模态情绪分析（MSA）比单模态情绪分析更能准确地理解人类的情绪状态。然而，不同的模态在表达情感时受到语义表达的限制，导致单模态对融合模态情感极性影响的重要性不一致，以及多模态之间相互作用导致的情感极性偏差。这可能会降低MSA的准确性。为了解决这一问题，本文提出了一种两阶段自适应融合网络（TsAFN）。第一阶段是基于模态特征连接的自适应融合网络。特征提取基于Bert和LSTM网络。提出了一种重要度度量自适应基准，提出了一种特征规划方法，将多模态特征联合表示形成融合模态特征，自动均衡单模态影响对融合模态情感极性的重要性。第二阶段是基于模态交互的自适应融合网络。定义了距离度量自适应基准，在此基础上提出了一种考虑多式联运交互的表示重构方法。调整模态的关系和情感极性偏差，以重建单模态情感极性和更准确地表示融合模态。最后，定义损失函数，并在MOSI、MOSEI和CH-SIMS三个数据集上对模型进行训练。对比实验结果表明，TsAFN在MSA中可以获得更好的精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

TsAFN: A two-stage adaptive fusion network for multimodal sentiment analysis

Multimodal sentiment analysis (MSA) provides a more accurate understanding of human emotional states than unimodal. However, the different modalities are limited by semantic expression in expressing emotion, leading to inconsistency in the importance of unimodal influence on the fused modal sentiment polarity, as well as sentiment polarity biases resulting from the interaction between multiple modalities. This can make MSA less accurate. To address this problem, we propose a two-stage adaptive fusion network (TsAFN) in this paper. The first stage is an adaptive fusion network based on the joint of modal features. Feature extraction is based on Bert and LSTM network. An importance metric adaptive benchmark is presented for proposing a feature planning method to jointly represent multimodal features to form fused modal features, which automatically equalizes the importance of unimodal influence on the fused modal sentiment polarity. The second stage is an adaptive fusion network based on modal interaction. A distance metric adaptive benchmark is defined, based on which a representation reconstruction method is proposed to take into account inter-modal interactions. The relationship and sentiment polarity biases of the modalities are adjusted to reconstruct unimodal sentiment polarity and a more accurate representation of the fused modality. Finally, the loss function is defined and the model is trained on three datasets MOSI, MOSEI, and CH-SIMS. The results of comparative experiments show that TsAFN can achieve better accuracy in MSA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Intelligence 工程技术-计算机：人工智能

CiteScore

6.60

自引率

20.80%

发文量

1361

审稿时长

5.9 months

期刊介绍： With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.