Information Fusion最新文献

DiffMark: Diffusion-based robust watermark against Deepfakes DiffMark：基于扩散的抗深度伪造鲁棒水印

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-10-04 DOI: 10.1016/j.inffus.2025.103801

Chen Sun , Haiyang Sun , Zhiqing Guo , Yunfeng Diao , Liejun Wang , Dan Ma , Gaobo Yang , Keqin Li

{"title":"DiffMark: Diffusion-based robust watermark against Deepfakes","authors":"Chen Sun , Haiyang Sun , Zhiqing Guo , Yunfeng Diao , Liejun Wang , Dan Ma , Gaobo Yang , Keqin Li","doi":"10.1016/j.inffus.2025.103801","DOIUrl":"10.1016/j.inffus.2025.103801","url":null,"abstract":"<div><div>Deepfakes pose significant security and privacy threats through malicious facial manipulations. While robust watermarking can aid in authenticity verification and source tracking, existing methods often lack sufficient robustness against Deepfake manipulations. Diffusion models have demonstrated remarkable performance in image generation, enabling the seamless fusion of watermark with image during generation. In this study, we propose a novel robust watermarking framework based on diffusion model, called DiffMark. By modifying the training and sampling scheme, we take the facial image and watermark as conditions to guide the diffusion model to progressively denoise and generate the corresponding watermarked image. In the construction of facial condition, we weight the facial image by a timestep-dependent factor that gradually reduces the guidance intensity with the decrease of noise, thus better adapting to the sampling process of diffusion model. To achieve the fusion of watermark condition, we introduce a cross information fusion (CIF) module that leverages a learnable embedding table to adaptively extract watermark features and integrates them with image features via cross-attention. To enhance the robustness of the watermark against Deepfake manipulations, we integrate a frozen autoencoder during training phase to simulate Deepfake manipulations. Additionally, we introduce Deepfake-resistant guidance that employs specific Deepfake model to adversarially guide the diffusion sampling process to generate more robust watermarked images. Experimental results demonstrate the effectiveness of the proposed DiffMark on typical Deepfakes. Our code will be available at <span><span>https://github.com/vpsg-research/DiffMark</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103801"},"PeriodicalIF":15.5,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MDM-NER: A multiple dependency modeling driven named entity recognition approach for judicial documents MDM-NER：一种多依赖模型驱动的司法文件命名实体识别方法

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-10-04 DOI: 10.1016/j.inffus.2025.103807

Haoze Yu , Qunhe Ji , Yan Li , Huanpu Yin , Haisheng Li , Xiaohui Li , Junping Du

{"title":"MDM-NER: A multiple dependency modeling driven named entity recognition approach for judicial documents","authors":"Haoze Yu , Qunhe Ji , Yan Li , Huanpu Yin , Haisheng Li , Xiaohui Li , Junping Du","doi":"10.1016/j.inffus.2025.103807","DOIUrl":"10.1016/j.inffus.2025.103807","url":null,"abstract":"<div><div>The characteristic of nested entities spanning a wide range in judicial documents poses significant challenges for entity recognition tasks. This paper proposes multiple dependency modeling driven named entity recognition model (MDM-NER), which can capture the association relationships between characters and words through the encoder module integrating Multi-Head Attention (MHA) and Cross-Attention (CA), and realize multi-dimensional collaborative recognition and label sequence optimization of nested entities through the decoder module composed of a joint predictor and Conditional Random Field model (CRF). It has demonstrated better comprehensive performance compared to existing models in comparative experiments when applied to Chinese corpora, English corpora, and constructed Judicial Document Corpus (JudDC), proving its adaptability, robustness, and transferability. In addition, the effectiveness of the significant components integrated attention (MHA-CA) and CRF was verified through ablation experiments, and the influence of two hyperparameters, quantity of heads and dilation rate, on the performance of model was discussed. As the key preliminary step, the proposed MDM-NER and constructed JudDR can be applied to the construction of judicial document knowledge graph, and the local deployment & data expansion tasks of vertical LLMs for judicial authorities.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103807"},"PeriodicalIF":15.5,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Comparison and Critical Reflection of Information Disorder Detection Techniques: Performing a Cross-Data and Cross-Model Evaluation 信息混乱检测技术的比较与批判性反思：执行跨数据和跨模型评估

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-10-04 DOI: 10.1016/j.inffus.2025.103806

Mark Nicolas Gruensteidl, Sabrina Kirrane

{"title":"A Comparison and Critical Reflection of Information Disorder Detection Techniques: Performing a Cross-Data and Cross-Model Evaluation","authors":"Mark Nicolas Gruensteidl, Sabrina Kirrane","doi":"10.1016/j.inffus.2025.103806","DOIUrl":"10.1016/j.inffus.2025.103806","url":null,"abstract":"<div><div>Information disorders, such as dis-, mis-, and malinformation, can lead to societal and/or economic harm. They are rapidly spread, extensively consumed on the web, and represent a threat to democracy. AI-based detection models can identify information disorders to some extent. However, major issues are the dynamics of news characteristics and concept drift. The generalization ability of a model is an important requirement and refers to its robustness when applied on unseen data. The aim of this work is to better understand the state of the art regarding information disorder detection approaches by conducting a reproducibility study and a cross-data and cross-model comparative analysis that leads to: (i) insights with respect to the effectiveness of binary information disorder classification; (ii) performance results on seen and unseen data; and (iii) new mixed European datasets named MENA. We conduct an evaluation of a fine-tuned BERT-based model applied on European data, which has received limited attention to date. The best performing models in our experiments are the RoBERTa and the Longformer models. The evaluation gives insights about potential biases of datasets that can be used to improve a model’s generalization ability. We also show that using domain-specific datasets for fine-tuning contributes to the robustness of models. Finally, we provide takeaways concerning reproducibility and stress the need for more transparent AI-based detection techniques.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103806"},"PeriodicalIF":15.5,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards a foundation model for behavioral intention prediction through multi-source data fusion 通过多源数据融合构建行为意向预测的基础模型

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-10-03 DOI: 10.1016/j.inffus.2025.103799

Weikang He, Yunpeng Xiao, Tun Li, Rong Wang, Qian Li

{"title":"Towards a foundation model for behavioral intention prediction through multi-source data fusion","authors":"Weikang He, Yunpeng Xiao, Tun Li, Rong Wang, Qian Li","doi":"10.1016/j.inffus.2025.103799","DOIUrl":"10.1016/j.inffus.2025.103799","url":null,"abstract":"<div><div>In recent years, some studies have attempted to build foundation models to predict user behavioral intentions across various scenarios. Existing methods typically involve two steps: first, using Pre-trained Language Models (PLMs) to convert user historical behaviors into semantic vectors; second, capturing latent user intentions through a behavioral sequence encoder. However, limitations still exist in this pipeline. Firstly, most studies directly fix the parameters of PLMs, making it difficult to optimize the generated semantic vectors for specific application scenarios. Even with fine-tuning PLMs considered, the huge model size brings computational burdens. To address this challenge, we introduce a Low-Rank Adaptation technique, which enhances the quality of semantic vectors by training two lightweight low-rank matrices while keeping the original parameters of PLMs frozen. Additionally, the semantic vectors generated by PLMs often exhibit anisotropy, which weakens the expressive power. To address this, we design a parameter whitening expert and a mixed observation expert module to achieve more isotropic semantic representations. Finally, there are significant differences in user behavior patterns across different domains, and such behavioral conflicts can limit the model’s generalization ability. Therefore, during the pre-training phase, we introduce multi-domain negative sample fusion and design two contrastive learning tasks. By employing a multi-task learning strategy, we enhance the model’s ability to handle multi-domain data. With these improvements, our model achieves stronger adaptability and accuracy in predicting user behavior across different domains.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103799"},"PeriodicalIF":15.5,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EVA-S3PC: Efficient, verifiable, accurate secure matrix multiplication protocol assembly and its application in regression EVA-S3PC：高效、可验证、准确的安全矩阵乘法协议组件及其在回归中的应用

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-10-03 DOI: 10.1016/j.inffus.2025.103803

Shizhao Peng , Tianle Tao , Derun Zhao , Tianrui Liu , Shoumo Li , Hao Sheng , Haogang Zhu

{"title":"EVA-S3PC: Efficient, verifiable, accurate secure matrix multiplication protocol assembly and its application in regression","authors":"Shizhao Peng , Tianle Tao , Derun Zhao , Tianrui Liu , Shoumo Li , Hao Sheng , Haogang Zhu","doi":"10.1016/j.inffus.2025.103803","DOIUrl":"10.1016/j.inffus.2025.103803","url":null,"abstract":"<div><div>Efficient multi-party secure matrix multiplication (SMM) is crucial for privacy-preserving machine learning (PPML), but existing mixed-protocol frameworks often face challenges in balancing security, efficiency, and accuracy. This paper presents an efficient, verifiable and accurate secure three-party computing (EVA-S3PC) framework that addresses these challenges by proposing elementary two-party and three-party matrix operations based on data obfuscation techniques. Our approach includes basic protocols for secure matrix multiplication, inversion, and hybrid multiplication, ensuring computational security and result verifiability. EVA-S3PC leverages Monte Carlo methods for robust anomaly detection, achieving a negligible error rate with a verification overhead that drops below 10 % for large-scale tasks. Experimental results demonstrate that EVA-S3PC achieves up to 14 significant decimal digits of precision in Float64 calculations, while reducing communication overhead by up to <span><math><mrow><mn>54.8</mn><mspace></mspace><mo>%</mo></mrow></math></span> compared to state-of-the-art methods. Furthermore, regression models trained using EVA-S3PC on vertically partitioned data achieve accuracy nearly identical to plaintext training. The framework’s practical application in secure three-party linear regression illustrates its potential in distributed PPML scenarios, offering a scalable, efficient, and precise solution for secure collaborative modeling across various domains such as healthcare and finance.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103803"},"PeriodicalIF":15.5,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel underwater target tracking method in UASNs via collaborative deep reinforcement learning 一种基于协同深度强化学习的水下目标跟踪方法

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-10-02 DOI: 10.1016/j.inffus.2025.103797

Linyao Zheng , Meiqin Liu , Senlin Zhang , Shanling Dong

{"title":"A novel underwater target tracking method in UASNs via collaborative deep reinforcement learning","authors":"Linyao Zheng , Meiqin Liu , Senlin Zhang , Shanling Dong","doi":"10.1016/j.inffus.2025.103797","DOIUrl":"10.1016/j.inffus.2025.103797","url":null,"abstract":"<div><div>Modern underwater acoustic sensor networks (UASNs), as vital infrastructure for marine surveillance, face dual challenges in energy-efficient sensor scheduling and correlation-aware data fusion for underewater target tracking under resource-constrained conditions. Existing UASNs-based target tracking methods suffer from key limitations, including environment-dependent scheduling with poor adaptability, reliance on predefined correlation models for multi-sensor fusion, and the separate optimization of inherently coupled tasks. To address these issues, we develop a cooperative deep reinforcement learning (CDRL)-based framework for underwater target tracking that performs joint optimization through coordinated policy design. In this framework, a scheduling agent adaptively selects energy-efficient sensing platforms under dynamic conditions, while a fusion agent implements a model-free strategy to alleviate the need for precise correlation models. Both agents are trained using Proximal Policy Optimization (PPO) within a multi-agent coordinate architecture equipped with a global critic, enabling collaborative decision-making across tasks. In addition, a mock data method is introduced to reduce reliance on accurate ground truth, enhancing robustness against non-cooperative targets. Numerical simulation and real-world experiment confirm that the proposed framework consistently outperforms conventional approaches, achieving no less than a 15<span><math><mo>%</mo></math></span> improvement in energy efficiency.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103797"},"PeriodicalIF":15.5,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Federated unlearning using diffusive noise injection 基于扩散性噪声注入的联合学习

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-10-02 DOI: 10.1016/j.inffus.2025.103796

Muhammad Mustafa Ali Usmani , Muhammad Atif Tahir , Humna Faisal , Muhammad Rafi

{"title":"Federated unlearning using diffusive noise injection","authors":"Muhammad Mustafa Ali Usmani , Muhammad Atif Tahir , Humna Faisal , Muhammad Rafi","doi":"10.1016/j.inffus.2025.103796","DOIUrl":"10.1016/j.inffus.2025.103796","url":null,"abstract":"<div><div>Recent advances in machine learning and deep learning have transformed daily life by using user data to extract patterns and insights. As data privacy concerns rise, the “right to be forgotten” has become increasingly important, driving the development of machine unlearning–a technique to remove specific data contributions from trained models. Most existing unlearning research assumes a centralized setting where the data resides on a central server. However, this assumption breaks in federated learning (FL), where data remains decentralized across clients who train a shared model without exposing raw data. This decentralized architecture introduces significant challenges for unlearning, such as identifying and removing specific data contributions, preserving global model performance, and ensuring privacy. Addressing these issues, we propose a client-level machine unlearning framework based on Diffusive Noise Injection (DNI). DNI gradually perturbs training inputs with structured noise to steer the model away from memorizing specific samples or classes, followed by a global model healing phase to restore accuracy and stability. The proposed approach is evaluated using Convolutional Neural Networks (CNNs) and Vision Transformers on standard FL benchmarks including CIFAR-10, CIFAR-100, and MNIST, as well as the KVASIR medical image dataset. Experimental results show that our method effectively unlearns target data while maintaining high accuracy, achieving performance comparable to state-of-the-art unlearning techniques across all datasets and model architectures.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103796"},"PeriodicalIF":15.5,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PCHT-GAN: Physics-guided adaptive fusion with dynamic low-rank attention for pipeline leak diagnosis under imbalanced data pht - gan：基于物理引导的动态低秩关注的管道泄漏诊断

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-10-02 DOI: 10.1016/j.inffus.2025.103802

Yongqiang Zhu , Shuaiyong Li , Xianming Lang , Liang Liu

{"title":"PCHT-GAN: Physics-guided adaptive fusion with dynamic low-rank attention for pipeline leak diagnosis under imbalanced data","authors":"Yongqiang Zhu , Shuaiyong Li , Xianming Lang , Liang Liu","doi":"10.1016/j.inffus.2025.103802","DOIUrl":"10.1016/j.inffus.2025.103802","url":null,"abstract":"<div><div>In industrial pipeline leak detection, the imbalanced data distribution and complex physical mechanisms limit the accuracy and reliability of intelligent diagnostic models. Although existing data augmentation methods expand sample sizes, their inability to incorporate physical constraints results in generated data deviating from leak response patterns. This significantly degrades model generalization and engineering applicability. To address this, this paper proposes a physically coupled hybrid transformer generative adversarial network (PCHT-GAN) framework that deeply integrates physical mechanisms with generative models for physics-informed, high-reliability data generation. First, a physical mechanism model is embedded into the generator, employing a collaborative mechanism prediction-data compensation paradigm to ensure joint physical distribution consistency. Second, to capture leakage signals' long-range spatiotemporal dependencies and transient characteristics, a dynamic low-rank bilinear spatiotemporal transformer (DLR-BiST) is designed. It compresses computational complexity via dynamic low-rank projections while comprehensively retaining critical features through bilinear spatiotemporal attention. Subsequently, a residual-guided attention gate network (ReAG-Net) is proposed that leverages physical residuals to dynamically generate attention weights, guiding the generator to focus on critical physical anomaly regions and perform adaptive compensation. Finally, a multi-task discriminator is designed, featuring parallel constraint branches to simultaneously ensure a balance between distribution and physical consistency in the generated data. Experimental results demonstrate that the data generated by the proposed model significantly outperforms all baseline methods in physical consistency and distribution quality, leading to substantial improvements in the recognition performance of fault diagnosis models.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103802"},"PeriodicalIF":15.5,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tensor representation-based dynamic graph neural network for traffic flow prediction using auxiliary information 基于张量表示的动态图神经网络辅助信息交通流预测

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-10-02 DOI: 10.1016/j.inffus.2025.103794

Jianli Zhao , Yiran Hua , Huan Huo , Qiuxia Sun , Qing Li , Hailong Zhang

{"title":"Tensor representation-based dynamic graph neural network for traffic flow prediction using auxiliary information","authors":"Jianli Zhao , Yiran Hua , Huan Huo , Qiuxia Sun , Qing Li , Hailong Zhang","doi":"10.1016/j.inffus.2025.103794","DOIUrl":"10.1016/j.inffus.2025.103794","url":null,"abstract":"<div><div>Accurately predicting traffic flow is paramount in addressing congestion issues within urban traffic management. However, traditional deep learning methods face limitations in handling the complex dynamic relationships among multi-source data, coupled with large model parameter counts, high computational complexity, and constraints imposed by purely data-driven approaches. To address these challenges, this study introduces the <u>T</u>ensor <u>R</u>epresentation-based <u>A</u>uxiliary Information <u>F</u>usion <u>Net</u>work (TrafNet). TrafNet integrates various types of traffic data to construct dynamic graph tensors, utilizing dynamic graph convolution to uncover local dynamic correlations across multi-source data. Furthermore, it enhances global dynamic relationship modeling through shared periodic embeddings, enabling the model to more accurately capture temporal dependencies between traffic data. Additionally, TrafNet employs tensor representation learning to decompose dynamic graph tensors into a multiplicative form of multiple small factors, thereby reducing model parameter counts. Lastly, the introduction of Laplacian graph embeddings as initial parameter values for constructing dynamic graph tensor factors enhances model stability and convergence speed. Experimental results demonstrate that TrafNet performs well on three publicly available datasets, achieving higher prediction accuracy and stability compared to traditional methods.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103794"},"PeriodicalIF":15.5,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bridging demonstration and multi-source attention fusion using LLMs for grounded multimodal named entity recognition 基于llm的多模态命名实体识别桥接演示和多源注意融合

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-10-02 DOI: 10.1016/j.inffus.2025.103800

Hua Zhang , Xianlv Liang , Wanxiang Cai , Pengliang Chen , Bi Chen , Bo Jiang , Ye Wang

{"title":"Bridging demonstration and multi-source attention fusion using LLMs for grounded multimodal named entity recognition","authors":"Hua Zhang , Xianlv Liang , Wanxiang Cai , Pengliang Chen , Bi Chen , Bo Jiang , Ye Wang","doi":"10.1016/j.inffus.2025.103800","DOIUrl":"10.1016/j.inffus.2025.103800","url":null,"abstract":"<div><div>Grounded multimodal named entity recognition (GMNER) is a challenging and emerging task that aims to identify all entity-type-region triplets from multimodal image-text pairs. Existing approaches often struggle with insufficient interaction between named entities and visual regions, leading to difficulties in accurate triplet alignment, cross-modal entity disambiguation, and visual semantic grounding. To tackle these challenges, we present a novel two-stage GMNER framework that integrates demonstration retrieval and multi-source cross-layer attention fusion. The initial stage for MNER employs entity-aware attention mechanism to select task-relevant demonstration examples, enabling large language models (LLMs) to generate high-quality external knowledge. The subsequent stage for visual grounding implements a sufficient cross-modal semantic interaction by introducing the multi-source multi-head cross-layer attention fusion (MMCAF) module, which integrates multi-source inputs (raw text, named and visual entity expressions, and image captions). Meanwhile, within this two-stage framework, we adopt a dual-LLM architecture using both text and vision LLMs, aiming to separate the generation of semantic priors from visual-language alignment and bridge gaps in cross-modal understanding. Our model achieves state-of-the-art performance across two GMNER datasets (Twitter-GMNER and Twitter-FMNERG) with different granularity, and further demonstrates superiority in ablation experiments and cross-domain evaluation.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103800"},"PeriodicalIF":15.5,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0