Information Fusion最新文献_第5页

MALM-CLIP: A generative multi-agent framework for multimodal fusion in few-shot industrial anomaly detection MALM-CLIP：一种生成式多智能体框架，用于工业异常检测中的多模态融合

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-09-22 DOI: 10.1016/j.inffus.2025.103765

Hanzhi Chen , Jingbin Que , Kexin Zhu , Zhide Chen , Fei Zhu , Wencheng Yang , Xu Yang , Xuechao Yang

{"title":"MALM-CLIP: A generative multi-agent framework for multimodal fusion in few-shot industrial anomaly detection","authors":"Hanzhi Chen , Jingbin Que , Kexin Zhu , Zhide Chen , Fei Zhu , Wencheng Yang , Xu Yang , Xuechao Yang","doi":"10.1016/j.inffus.2025.103765","DOIUrl":"10.1016/j.inffus.2025.103765","url":null,"abstract":"<div><div>The Contrastive Language-Image Pre-training (CLIP) model has significantly improved few-shot industrial anomaly detection. However, existing approaches often rely on manually crafted visual description texts, which lack robustness and generalizability in real-world production settings. This limitation is evident as these methods struggle to adapt to new or evolving anomalies, where original prompts fail to generalize beyond their initial design. This paper proposes a novel method, Multi-agent Language Models with CLIP (MALM-CLIP), which integrates the generative capabilities of large language models (LLMs) with CLIP within a multi-agent framework. In this system, specialized agents handle different subtasks such as prompt generation and model evaluation, enabling automated and context-aware multimodal information fusion. By eliminating manual prompt engineering, MALM-CLIP enhances both the accuracy and efficiency of anomaly detection. Experimental results on standard datasets such as MVTec and VisA demonstrate that our approach outperforms existing methods in detecting image-level anomalies with minimal training data. This work highlights the potential of combining Generative Artificial Intelligence (GenAI) and multi-agent systems for robust few-shot industrial anomaly detection.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103765"},"PeriodicalIF":15.5,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145159143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LeCDSR: Large language model enhanced cross-domain sequential recommendation LeCDSR：大型语言模型增强的跨领域顺序推荐

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-09-22 DOI: 10.1016/j.inffus.2025.103762

Shuliang Wang , Jiabao Zhu , Kaibo Wang , Sijie Ruan

{"title":"LeCDSR: Large language model enhanced cross-domain sequential recommendation","authors":"Shuliang Wang , Jiabao Zhu , Kaibo Wang , Sijie Ruan","doi":"10.1016/j.inffus.2025.103762","DOIUrl":"10.1016/j.inffus.2025.103762","url":null,"abstract":"<div><div>As large language models (LLMs) have shown great performance in natural language processing, research on applying them to recommendation systems has emerged. LLMs’ strong understanding, reasoning, and extensive world knowledge can supplement the missing semantic information in recommendation systems. Existing LLM-enhanced recommendation systems face challenges in extracting and leveraging features, lack of sufficient utilization of LLMs’ capabilities to capture user interests. In this paper, a novel algorithm, <strong>L</strong>arge Language Model <strong>e</strong>nhanced <strong>C</strong>ross-<strong>D</strong>omain <strong>S</strong>equential <strong>R</strong>ecommendation, LeCDSR is proposed. LeCDSR generates cross-domain user profile embeddings through LLMs to transfer user preference information across domains. It also uses a semantic fusion layer to integrate semantic and ID embeddings, addressing the limitations of traditional sequential recommendation models. Furthermore, LeCDSR employs a contrastive loss function to better align the feature spaces of LLMs and recommendation models, improving recommendation performance in cross-domain scenarios. LeCDSR has been tested on two real-world datasets and has achieved better performance than common cross-domain sequential recommendation models. Rich ablation experiments also verify the effectiveness of LeCDSR’s modules and the generated embeddings from the large model. Our implementation is available at this repository: <span><span>https://github.com/solozhu/LeCDSR</span><svg><path></path></svg></span></div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103762"},"PeriodicalIF":15.5,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145221724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Topology-aware multi-view hypergraph computation-based cross-modal brain network fusion for brain disease diagnosis 基于拓扑感知多视图超图计算的跨模态脑网络融合脑疾病诊断

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-09-21 DOI: 10.1016/j.inffus.2025.103751

Jingxi Feng , Shaoyi Du , Heming Xu , Rundong Xue , Xiangmin Han , Dong Zhang , Jue Jiang , Yue Gao , Juan Wang

{"title":"Topology-aware multi-view hypergraph computation-based cross-modal brain network fusion for brain disease diagnosis","authors":"Jingxi Feng , Shaoyi Du , Heming Xu , Rundong Xue , Xiangmin Han , Dong Zhang , Jue Jiang , Yue Gao , Juan Wang","doi":"10.1016/j.inffus.2025.103751","DOIUrl":"10.1016/j.inffus.2025.103751","url":null,"abstract":"<div><div>Cross-modal brain networks characterize the complex connections between different brain regions from both functional and structural perspectives. Deep fusion of functional and structural brain network information is crucial for brain disease diagnosis. However, existing methods overlook the intricate semantic and topological relationships between functional and structural brain networks, as well as their critical roles in brain network information transmission. To address these limitations, this paper proposes a topology-aware multi-view hypergraph computation-based cross-modal brain network fusion (TMHGC-CBNF) method. The TMHGC-CBNF achieves high-precision brain disease diagnosis through topology-aware multi-view brain network modeling, efficient message propagation, and multi-strategy fusion of cross-modal brain network information. Specifically, the topology-aware multi-view hypergraph computation method first constructs multi-view hypergraphs to model multi-level high-order correlations guided by topological structure and the semantic correlations under topological constraint in the functional brain network, while using graph structures to model the structural brain network. Based on this, parallel hypergraph convolutions are employed to simulate efficient information propagation patterns in the functional brain network and extract high-order feature representations for each view. Graph convolution is used to extract feature representations of the structural brain network. Next, a multi-strategy fusion method progressively and orthogonally fuses high-order functional brain network information from different views, and a topological encoding-based dual-channel cross-attention module facilitates the interaction of functional and structural brain network information in the topological space. Experiments on the three datasets demonstrate that the proposed method outperforms the current state-of-the-art methods and is capable of identifying cross-modal brain network biomarkers.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103751"},"PeriodicalIF":15.5,"publicationDate":"2025-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145229530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BotDMM: Dual-channel multi-modal learning for LLM-driven bot detection on social media BotDMM：社交媒体上llm驱动的机器人检测的双通道多模态学习

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-09-21 DOI: 10.1016/j.inffus.2025.103758

Jinglong Duan , Shiqing Wu , Weihua Li , Quan Bai , Minh Nguyen , Jianhua Jiang

{"title":"BotDMM: Dual-channel multi-modal learning for LLM-driven bot detection on social media","authors":"Jinglong Duan , Shiqing Wu , Weihua Li , Quan Bai , Minh Nguyen , Jianhua Jiang","doi":"10.1016/j.inffus.2025.103758","DOIUrl":"10.1016/j.inffus.2025.103758","url":null,"abstract":"<div><div>Social bots are becoming a growing concern due to their ability to spread misinformation and manipulate public discourse. The emergence of powerful Large Language Models (LLMs), such as ChatGPT, has introduced a new generation of bots capable of producing fluent and human-like text while dynamically adapting their relational patterns over time. These LLM-driven bots seamlessly blend into online communities, making them significantly more challenging to detect. Most existing approaches rely on static features or simple behavioral patterns, which are not effective against bots that can evolve both their language and their network connections. To address these challenges, we propose a novel Dual-channel Multi-Modal learning (BotDMM) framework for LLM-driven bot detection. The proposed model captures discriminative information from two complementary sources: users’ content features (including their profiles and temporal posting behavior) and structural features (reflecting local network topology). Furthermore, we employ a joint training approach that incorporates two carefully designed self-supervised learning paradigms alongside the primary prediction task to enhance discrimination between human users, traditional bots, and LLM-driven bots. Extensive experiments demonstrate the effectiveness and superiority of BotDMM compared to state-of-the-art baselines. The implementation of BotDMM has been released at: <span><span>https://github.com/JaydenDuan/DualChannelBotDMM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103758"},"PeriodicalIF":15.5,"publicationDate":"2025-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145221666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A survey on long-term traffic prediction from the information fusion perspective: Requirements, methods, applications, and outlooks 信息融合视角下的长期流量预测：需求、方法、应用与展望

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-09-20 DOI: 10.1016/j.inffus.2025.103677

Feifei Kou , Ziyan Zhang , Yuhan Yao , Yuxian Zhu , Jiahao Wang , Ruiping Yuan , Yifan Zhu

{"title":"A survey on long-term traffic prediction from the information fusion perspective: Requirements, methods, applications, and outlooks","authors":"Feifei Kou , Ziyan Zhang , Yuhan Yao , Yuxian Zhu , Jiahao Wang , Ruiping Yuan , Yifan Zhu","doi":"10.1016/j.inffus.2025.103677","DOIUrl":"10.1016/j.inffus.2025.103677","url":null,"abstract":"<div><div>Long-term traffic prediction (LTP) aims to predict future traffic conditions based on the fusion of multi-dimensional historical data across extended time horizons, emerging as a rapidly advancing research domain with extensive applications in predicting traffic flow, speed, accident likelihood, and congestion patterns, thereby significantly enhancing societal mobility and quality of life. Compared with the general traffic prediction task, the traffic prediction task under long time span is more challenging. It is necessary to summarize the internal requirements of LTP to lead the development of this field. However, there has been no comprehensive review to systematically summarize and synthesize it. To address this gap, we present the first systematic survey of LTP from an information fusion perspective, encompassing interval requirements, targeted methodologies, application scenarios, and performance metrics. Specifically, we first establish the knowledge framework of traffic prediction tasks and formalize the concept of LTP, then categorize and analyze existing approaches through the lens of internal requirements. Furthermore, we meticulously examine application scenarios alongside corresponding performance benchmarks, datasets, and evaluation metrics. Ultimately, we delineate prevailing challenges and potential research directions to inspire future investigations.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103677"},"PeriodicalIF":15.5,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145159140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MedViA: Empowering medical time series classification with vision augmentation and multimodal fusion MedViA：通过视觉增强和多模态融合增强医疗时间序列分类能力

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-09-20 DOI: 10.1016/j.inffus.2025.103659

Wei Fan , Jingru Fei , Jindong Han , Jie Lian , Hangting Ye , Xiaozhuang Song , Xin Lv , Kun Yi , Min Li

{"title":"MedViA: Empowering medical time series classification with vision augmentation and multimodal fusion","authors":"Wei Fan , Jingru Fei , Jindong Han , Jie Lian , Hangting Ye , Xiaozhuang Song , Xin Lv , Kun Yi , Min Li","doi":"10.1016/j.inffus.2025.103659","DOIUrl":"10.1016/j.inffus.2025.103659","url":null,"abstract":"<div><div>The analysis of medical time series, such as Electrocardiography (ECG) and Electroencephalography (EEG), is fundamental to clinical diagnostics and patient monitoring. Accurate and automated classification of these signals can facilitate early disease detection and personalized treatment, thereby improving patient outcomes. Although deep learning models are widely adopted, they mainly process signals as sequential numerical data. Such a single-modality approach often misses the holistic visual patterns easily recognized by clinicians from graphical charts and struggles to model the complex non-linear dynamics of physiological data. As a result, the rich diagnostic cues contained in visual representations remain largely untapped, limiting model performance. To address these limitations, we propose <em>MedViA</em>, a novel multimodal learning framework that empowers <em>Med</em>ical time series classification by integrating both <em>Vi</em>sion <em>A</em>ugmentation and numeric perception. Our core innovation is to augment the raw medical time series signals into the visual modality, enabling a dual-pathway architecture that computationally mimics the comprehensive reasoning of clinical experts. With the augmentation, MedViA then features two parallel perception branches: a <em>Visual Perception Module</em>, built upon a novel Multi-resolution Differential Vision Transformer, processes the augmented images to capture high-level structural patterns and diagnostically critical waveform morphologies. Concurrently, a <em>Numeric Perception Module</em>, which uses our proposed Temporal Kolmogorov Network to model fine-grained and non-linear dynamics directly from the raw time series. To synergistically integrate the insights from these dedicated pathways, we introduce a <em>Medically-informed Hierarchical Multimodal Fusion</em> strategy, which uses a late-fusion architecture and a hierarchical optimization objective to derive for the final classification. We have conducted extensive experiments on multiple public medical time series datasets, which demonstrate the superior performance of our method compared to state-of-the-art approaches.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103659"},"PeriodicalIF":15.5,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145159148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FUSE-IS: multi-modal data fusion for carbon-aware security in industrial energy systems FUSE-IS：用于工业能源系统中碳感知安全的多模态数据融合

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-09-20 DOI: 10.1016/j.inffus.2025.103759

Huamao Jiang , Fazlullah Khan , Ryan Alturki , Bandar Alshawi , Xiangjian He , Syed Tauhid Ullah Shah

{"title":"FUSE-IS: multi-modal data fusion for carbon-aware security in industrial energy systems","authors":"Huamao Jiang , Fazlullah Khan , Ryan Alturki , Bandar Alshawi , Xiangjian He , Syed Tauhid Ullah Shah","doi":"10.1016/j.inffus.2025.103759","DOIUrl":"10.1016/j.inffus.2025.103759","url":null,"abstract":"<div><div>Modern industrial energy systems are increasingly reliant on heterogeneous data streams from sensors, grid infrastructure, renewable forecasts, and cybersecurity telemetry. Effectively fusing these diverse sources is essential for achieving resilient, efficient, and sustainable operations. In this paper, we present a Fusion-based Unified Security and Energy efficiency approach for Industrial Systems (FUSE-IS), a novel multi-modal data fusion framework. FUSE-IS integrates deep learning-based threat detection, differential privacy mechanisms, and carbon-aware resource scheduling. It enhances security, privacy, and energy efficiency in industrial energy environments. Unlike traditional solutions that address these objectives in isolation, FUSE-IS employs a unified data fusion approach that combines these solutions. As a result, it enabled real-time adaptive decision-making for threat mitigation, data protection, and carbon-optimized computing. Experimental results demonstrate that FUSE-IS achieves 98.5 % detection accuracy with only 1.2 % false positives, while reducing energy consumption by 24 % and carbon emissions by 20 % compared to baseline methods. The framework maintains strong privacy guarantees (<span><math><mi>ϵ</mi></math></span> = 0.9) with minimal accuracy degradation (0.7 %). A case study on DDoS mitigation illustrates FUSE-IS’s ability to dynamically adjust defense strategies based on carbon intensity fluctuations, resulting in a 27 % emission reduction during the attack window.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103759"},"PeriodicalIF":15.5,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145221721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A multi-stage cuff-less continuous blood pressure estimation method for overcoming subject specificity 一种克服受试者特异性的多阶段无袖带连续血压估计方法

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-09-19 DOI: 10.1016/j.inffus.2025.103764

Yongjian Li, Meng Chen, Mingsen Du, Shoushui Wei

{"title":"A multi-stage cuff-less continuous blood pressure estimation method for overcoming subject specificity","authors":"Yongjian Li, Meng Chen, Mingsen Du, Shoushui Wei","doi":"10.1016/j.inffus.2025.103764","DOIUrl":"10.1016/j.inffus.2025.103764","url":null,"abstract":"<div><div>Cuff-less continuous blood pressure (BP) estimation is essential for hypertension prevention and management. However, subjects have differences in vascular characteristics, pre-ejection period, and dynamic physiological states, which leads to inter-class specificity of BP in different categories and individual specificity in the same category. This study proposes a multi-stage cuff-less continuous BP estimation method using photoplethysmography and electrocardiogram. (1) In the classification stage, a recursively coupled neural network capable of compensating for deep semantic expression is constructed to overcome the impact of inter-class specificity. Based on layer-wise aggregation of channel encoded information and embedded coordinate attention mechanisms, it captures spatial dependencies of multi-level features, thereby categorizing subjects into predefined classes. (2) In the BP estimation stage, a multi-operator dynamically adjusted neural network is proposed to address individual specificity. Inspired by the human brain’s multi-level and multi-perspective information processing mechanisms, it integrates multiple advanced operators to process information, deciphering the nonlinear relationship between real-time variations in blood volume, cardiac electrical activity, and BP. Simultaneously, it incorporates lightweight attention mechanism and cross-guidance strategy to adaptively adjust the responsiveness of different operators, thereby enhancing its dynamic adaptability. Under the inter-patient paradigm clinical test, mean absolute errors for mean arterial pressure, systolic blood pressure, and diastolic blood pressure reached 3.03±2.38 mmHg, 2.96±2.45 mmHg, and 2.74±2.21 mmHg respectively, meeting both the Association for the Advancement of Medical Instrumentation standards and British Hypertension Society Grade A criteria. This study demonstrates significant implications for overcoming subject specificity and achieving personalized BP management.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103764"},"PeriodicalIF":15.5,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145159741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multimodal named entity recognition in the era of large pre-trained models: A comprehensive survey 大型预训练模型时代的多模态命名实体识别：综述

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-09-19 DOI: 10.1016/j.inffus.2025.103767

Mingying Xu , Fei Hou , Jie Liu , Mengmei Zhang , Lei Shi , Feifei Kou , Lei Guo , Philip S. Yu , Xuming Hu

{"title":"Multimodal named entity recognition in the era of large pre-trained models: A comprehensive survey","authors":"Mingying Xu , Fei Hou , Jie Liu , Mengmei Zhang , Lei Shi , Feifei Kou , Lei Guo , Philip S. Yu , Xuming Hu","doi":"10.1016/j.inffus.2025.103767","DOIUrl":"10.1016/j.inffus.2025.103767","url":null,"abstract":"<div><div>The rapid development of social media, such as Twitter and Facebook, has made tweets an essential resource for various applications, including collecting breaking news, identifying cyber-attacks, and detecting disease outbreaks. As social media becomes increasingly multimodal, Multimodal Named Entity Recognition (MNER) for social media has been widely studied to extract valuable information from tweets and enhance the understanding of tweet content. In recent years, with the development of Large Pre-trained Models (LPMs), many research fields have undergone revolutionary changes, continuously pushing the performance boundaries of various tasks, especially in social media. Tweets exist on a massive scale and involve multiple media types. First, LPMs can effectively handle semantic sparsity and capture richer visual features, providing a more accurate understanding of tweets containing ambiguous or sparse terms. LPMs can provide background knowledge, enhancing the expressiveness of semantically sparse tweets. LPMs possess the ability for cross-modal semantic alignment, enabling them to integrate and optimize the semantic information from both text and images, effectively fusing multimodal features and reducing noise. However, despite significant advantages of LPMs, they still face specific challenges when handling complex tasks, particularly when there is a lack of clear supporting evidence, leading to the generation of erroneous “hallucinated” content. This is primarily due to LPMs’ insufficient contextual support during the fusion of multimodal information, leading to inaccurate reasoning and reducing the model’s reliability. MNER, by integrating multimodal information from both text and images, can provide LPMs with more factual grounding and contextual support, reducing the likelihood of generating hallucinated content and enhancing the reasoning ability and accuracy of LPMs. Therefore, this survey is the first to systematically review the research progress of LPMs in MNER from the perspectives of multimodal representation, multimodal alignment, and multimodal fusion and explores the application of LPMs in MNER. Finally, it summarizes the main challenges that MNER faces and provides an outlook on future development directions for MNER.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103767"},"PeriodicalIF":15.5,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145221597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-tissue deep fusion network for prediction of pulmonary metastasis in hepatocellular carcinoma 多组织深度融合网络预测肝细胞癌肺转移

IF 15.5 1区计算机科学

Information Fusion Pub Date : 2025-09-19 DOI: 10.1016/j.inffus.2025.103748

Shuoling Zhou , Sirui Fu , Wenbo Wang , Shuguang Liu , Lei Yang , Mingyue Cai , Qianjin Feng , Meiyan Huang

{"title":"Multi-tissue deep fusion network for prediction of pulmonary metastasis in hepatocellular carcinoma","authors":"Shuoling Zhou , Sirui Fu , Wenbo Wang , Shuguang Liu , Lei Yang , Mingyue Cai , Qianjin Feng , Meiyan Huang","doi":"10.1016/j.inffus.2025.103748","DOIUrl":"10.1016/j.inffus.2025.103748","url":null,"abstract":"<div><div>Pulmonary metastasis is a critical adverse prognostic factor in patients with hepatocellular carcinoma (HCC), underscoring the need for accurate prediction to guide prognoses and treatment decisions. However, current prediction methods are hindered by two major challenges: (1) inter-class similarity and intra-class variation in computed tomography (CT) images, and (2) the predominant methods focus on extracting tumor-associated features, despite evidence that metastasis may often be related to the degree of hepatic cirrhosis and deformation of hepatic vessels. To address these limitations, we propose a multi-tissue deep fusion network (MDFNet) for predicting pulmonary metastasis from CT images. The network employs MeshNet as the backbone to extract spatial structural features and capture tumor heterogeneity, cirrhosis severity, and vascular deformation. A dual-level contrastive learning module highlights feature disparities across tissues to enhance the network’s feature representational ability, while a triple attention mechanism-based feature fusion module integrates multi-tissue features to identify essential predictive information. MDFNet was validated on a multi-center dataset including seven clinical centers. The experimental results demonstrate that, compared to existing methods, MDFNet exhibits the highest area under the receiver operating characteristic curve of 0.7948 and accuracy of 0.7622 on an independent testing set. Despite its effectiveness, the model currently uses only single time-point venous-phase CT images; future work will incorporate multi-phase CT sequences and dynamic follow-up scans to further improve prediction performance.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103748"},"PeriodicalIF":15.5,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145159138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0