An adaptive auto fusion with hierarchical attention for multimodal fake news detection

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-05-08 DOI:10.1016/j.eswa.2025.127930

Alex Munyole Luvembe , Weimin Li , Shaohua Li , Guiqiong Xu , Xing Wu , Fangfang Liu

{"title":"An adaptive auto fusion with hierarchical attention for multimodal fake news detection","authors":"Alex Munyole Luvembe , Weimin Li , Shaohua Li , Guiqiong Xu , Xing Wu , Fangfang Liu","doi":"10.1016/j.eswa.2025.127930","DOIUrl":null,"url":null,"abstract":"<div><div>The phenomenon of fake news often relies on diverse multimodal evidence to deceive readers and achieve widespread popularity. While existing fusion methods aim to enhance feature interaction, they typically rely on concatenation or attention mechanisms that struggle to model nuanced dynamics of multimodal information due to missing data and modality heterogeneity. To overcome these limitations, we propose an <strong>A</strong>daptive <strong>A</strong>uto <strong>F</strong>usion with <strong>H</strong>ierarchical <strong>A</strong>ttention <strong>(AAFHA)</strong> framework for multimodal fake news detection. AAFHA integrates image captions directly into the fusion pipeline to strengthen cross-modal learning, unlike prior approaches that treat them as siloed inputs. We first design a multi-level interaction for text and captions by incorporating hierarchical encoding to capture both local and global dependencies, allowing the model to detect subtle cross-modal associations. Then, a sparse weighting technique, guided by hierarchical attention, further refines these interactions by dynamically allocating attention across modalities. This guided focus is implemented through a constrained SoftMax function, improving contextual alignment and reducing isolated feature modeling. To enable adaptive semantic integration, we introduce an Auto-Fusion module that supports dynamic end-to-end training. The model optimizes a learned similarity measure in a shared representation space, aligning textual, caption, and image features to adaptively capture semantic associations. Additionally, sparse training with contrastive loss is incorporated to preserve semantic consistency and enhance class separability during fusion. Experimental results demonstrate that AAFHA outperforms existing baselines, yielding accuracy improvements of 0.094%, 0.198%, and 0.001% on the PolitiFact, Gossip, and Pheme datasets, respectively. These findings demonstrate the model’s effectiveness in identifying multimodal fake news.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"285 ","pages":"Article 127930"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425015520","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The phenomenon of fake news often relies on diverse multimodal evidence to deceive readers and achieve widespread popularity. While existing fusion methods aim to enhance feature interaction, they typically rely on concatenation or attention mechanisms that struggle to model nuanced dynamics of multimodal information due to missing data and modality heterogeneity. To overcome these limitations, we propose an Adaptive Auto Fusion with Hierarchical Attention (AAFHA) framework for multimodal fake news detection. AAFHA integrates image captions directly into the fusion pipeline to strengthen cross-modal learning, unlike prior approaches that treat them as siloed inputs. We first design a multi-level interaction for text and captions by incorporating hierarchical encoding to capture both local and global dependencies, allowing the model to detect subtle cross-modal associations. Then, a sparse weighting technique, guided by hierarchical attention, further refines these interactions by dynamically allocating attention across modalities. This guided focus is implemented through a constrained SoftMax function, improving contextual alignment and reducing isolated feature modeling. To enable adaptive semantic integration, we introduce an Auto-Fusion module that supports dynamic end-to-end training. The model optimizes a learned similarity measure in a shared representation space, aligning textual, caption, and image features to adaptively capture semantic associations. Additionally, sparse training with contrastive loss is incorporated to preserve semantic consistency and enhance class separability during fusion. Experimental results demonstrate that AAFHA outperforms existing baselines, yielding accuracy improvements of 0.094%, 0.198%, and 0.001% on the PolitiFact, Gossip, and Pheme datasets, respectively. These findings demonstrate the model’s effectiveness in identifying multimodal fake news.

查看原文本刊更多论文

基于层次关注的自适应自动融合多模态假新闻检测

假新闻现象往往依靠多种多样的多模态证据来欺骗读者，达到广泛流行的目的。虽然现有的融合方法旨在增强特征交互，但它们通常依赖于连接或注意机制，由于缺少数据和模态异质性，这些机制难以对多模态信息的细微动态建模。为了克服这些限制，我们提出了一种用于多模态假新闻检测的自适应自动融合分层注意（AAFHA）框架。AAFHA将图像标题直接集成到融合管道中，以加强跨模态学习，而不像以前的方法将它们视为孤立的输入。我们首先为文本和标题设计了一个多层次的交互，通过结合层次编码来捕获本地和全局依赖关系，允许模型检测微妙的跨模态关联。然后，在分层注意的指导下，稀疏加权技术通过在模态之间动态分配注意力来进一步细化这些交互。这种引导焦点是通过约束的SoftMax功能实现的，改善了上下文对齐并减少了孤立的特征建模。为了实现自适应语义集成，我们引入了一个支持动态端到端训练的Auto-Fusion模块。该模型在共享表示空间中优化学习的相似性度量，对齐文本、标题和图像特征，以自适应地捕获语义关联。此外，在融合过程中引入了带有对比损失的稀疏训练，以保持语义一致性并增强类的可分离性。实验结果表明，AAFHA优于现有的基线，在PolitiFact、Gossip和Pheme数据集上的准确率分别提高了0.094%、0.198%和0.001%。这些发现证明了该模型在识别多模态假新闻方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.