Information FusionPub Date : 2025-04-05DOI: 10.1016/j.inffus.2025.103158
Yonglin Tian , Fei Lin , Yiduo Li , Tengchao Zhang , Qiyao Zhang , Xuan Fu , Jun Huang , Xingyuan Dai , Yutong Wang , Chunwei Tian , Bai Li , Yisheng Lv , Levente Kovács , Fei-Yue Wang
{"title":"UAVs meet LLMs: Overviews and perspectives towards agentic low-altitude mobility","authors":"Yonglin Tian , Fei Lin , Yiduo Li , Tengchao Zhang , Qiyao Zhang , Xuan Fu , Jun Huang , Xingyuan Dai , Yutong Wang , Chunwei Tian , Bai Li , Yisheng Lv , Levente Kovács , Fei-Yue Wang","doi":"10.1016/j.inffus.2025.103158","DOIUrl":"10.1016/j.inffus.2025.103158","url":null,"abstract":"<div><div>Low-altitude mobility, exemplified by unmanned aerial vehicles (UAVs), has introduced transformative advancements across various domains, like transportation, logistics, and agriculture. Leveraging flexible perspectives and rapid maneuverability, UAVs extend traditional systems’ perception and action capabilities, garnering widespread attention from academia and industry. However, current UAV operations primarily depend on human control, with only limited autonomy in simple scenarios, and lack the intelligence and adaptability needed for more complex environments and tasks. The emergence of large language models (LLMs) demonstrates remarkable problem-solving and generalization capabilities, offering a promising pathway for advancing UAV intelligence. This paper explores the integration of LLMs and UAVs, beginning with an overview of UAV systems’ fundamental components and functionalities, followed by an overview of the state-of-the-art LLM technology. Subsequently, it systematically highlights the multimodal data resources available for UAVs, which provide critical support for training and evaluation. Furthermore, key tasks and application scenarios where UAVs and LLMs converge are categorized and analyzed. Finally, a reference roadmap towards agentic UAVs is proposed to enable UAVs to achieve agentic intelligence through autonomous perception, memory, reasoning, and tool utilization. Related resources are available at <span><span>https://github.com/Hub-Tian/UAVs_Meet_LLMs</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103158"},"PeriodicalIF":14.7,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-04-05DOI: 10.1016/j.inffus.2025.103170
Yuhan Wang, Hak Keung Lam
{"title":"Explainable variable-weight multi-modal based deep learning framework for catheter malposition detection","authors":"Yuhan Wang, Hak Keung Lam","doi":"10.1016/j.inffus.2025.103170","DOIUrl":"10.1016/j.inffus.2025.103170","url":null,"abstract":"<div><div>Hospital patients may have catheters and lines inserted for quick administration of medicines or medical tests. However, a misplaced catheter can cause serious complications, even death. Recently, deep learning frameworks have shown their potential to assist in detecting catheter malposition in radiography. However, the deep learning malposition detection frameworks meet three main challenges: (1) Most approaches rely heavily on visual information, requiring models with many parameters for accurate detection. (2) Geometric information in radiography that is important for experts for decision making is often underutilized due to the inherent complexities in accurately extracting and integrating it with visual information. (3) Feature significance in catheter status detection is often underexplored, making the framework difficult to interpret and requiring a mechanism to highlight key factors influencing decisions. Therefore, to address these challenges, an explainable variable-weight multimodal based deep learning framework is proposed to fuse the visual and geometric information in the radiography for catheter malposition detection. The convolution neural network (CNN) stream and the graph convolution neural network (GCN) stream, with few learnable parameters, are designed to extract the visual and geometric information without compromising performance. The cross-modal attention block is proposed to capture the relationship between visual and geometric information. Furthermore, the multimodal variable-weight structure is proposed to fuse different modalities based on their significance. To visualize the contribution of each modality, the multimodal class activation map (MCAM) is designed to visualize the activated region in radiography, showing where the framework focuses. The proposed method obtains state-of-the-art performance, gaining 0.8816 mean AUC with 7.62 million parameters.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103170"},"PeriodicalIF":14.7,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143833880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-04-05DOI: 10.1016/j.inffus.2025.103164
Feng Xue , Wenzhuang Xu , Guofeng Zhong , Anlong Ming , Nicu Sebe
{"title":"Cues3D: Unleashing the power of sole NeRF for consistent and unique instances in open-vocabulary 3D panoptic segmentation","authors":"Feng Xue , Wenzhuang Xu , Guofeng Zhong , Anlong Ming , Nicu Sebe","doi":"10.1016/j.inffus.2025.103164","DOIUrl":"10.1016/j.inffus.2025.103164","url":null,"abstract":"<div><div>Open-vocabulary 3D panoptic segmentation has recently emerged as a significant trend. Top-performing methods currently integrate 2D segmentation with geometry-aware 3D primitives. However, the advantage would be lost without high-fidelity 3D point clouds, such as methods based on Neural Radiance Field (NeRF). These methods are limited by the insufficient capacity to maintain consistency across partial observations. To address this, recent works have utilized contrastive loss or cross-view association pre-processing for view consensus. In contrast to them, we present <strong>Cues3D</strong>, a compact approach that relies solely on NeRF instead of pre-associations. The core idea is that NeRF’s implicit 3D field inherently establishes a globally consistent geometry, enabling effective object distinction without explicit cross-view supervision. We propose a three-phase training framework for NeRF, <em>initialization</em>-<em>disambiguation</em>-<em>refinement</em>, whereby the instance IDs are corrected using the initially-learned knowledge. Additionally, an instance disambiguation method is proposed to match NeRF-rendered 3D masks and ensure globally unique 3D instance identities. With the aid of Cues3D, we obtain highly consistent and unique 3D instance ID for each object across views with a balanced version of NeRF. Our experiments are conducted on ScanNet v2, ScanNet200, ScanNet++, and Replica datasets for 3D instance, panoptic, and semantic segmentation tasks. Cues3D outperforms other 2D image-based methods and competes with the latest 2D-3D merging based methods, while even surpassing them when using additional 3D point clouds. The code link could be found in the appendix and will be released on <span><span><em>github</em></span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103164"},"PeriodicalIF":14.7,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-04-05DOI: 10.1016/j.inffus.2025.103147
Hui Yu , Qingyong Wang , Xiaobo Zhou
{"title":"Adaptive-weighted federated graph convolutional networks with multi-sensor data fusion for drug response prediction","authors":"Hui Yu , Qingyong Wang , Xiaobo Zhou","doi":"10.1016/j.inffus.2025.103147","DOIUrl":"10.1016/j.inffus.2025.103147","url":null,"abstract":"<div><div>Drug response prediction is a vital task owing to the heterogeneity of cancer patients, enabling individualized therapy. Graph convolutional networks (GCNs) are able to predict the response of cancer cell lines and patients to novel drugs or drug combinations. Furthermore, GCNs leveraging multi-sensor data can improve drug response prediction accuracy. However, it is a challenge for GCNs to build an efficient method to enable data sharing between different institutions because of data privacy and security. This would not integrate multi-sensor data, leading to a decrease of the data scale, which decreases model performance. Besides, heterogeneous noises exist in multi-sensor data, which decreases the performance of the learning system. To this end, we propose a novel <u>a</u>daptive-weighted <u>f</u>ederated graph convolutional networks (called AFGCNs) based on heterogeneous multisource multiomics-drug data privacy-preserving fusion to predict drug response. Specifically, AFGCNs can integrate various multiomics and drug data to learn key internal relations under privacy protection. Meanwhile, AFGCNs can capture association relationships between multisource data in multiple parties to reweight these multisource data for denoising, which can improve the performance of AFGCNs. The experimental results have demonstrated that AFGCNs outperforms state-of-the-art comparison methods by a large margin for drug response prediction, including single drugs as well as targeted drugs. More specifically, the AFGCNs exceeds the average value of all comparison methods by approximately 8% in terms of F1 Score metric in the random cross validation experiment. Furthermore, molecular docking experiments are conducted to validate the model’s performance in accurately predicting the target drug. In general, AFGCNs is regarded as a successful method for bridging the gap between multiple institutions and maintaining data security and privacy, which provides an effective way to accelerate drug discovery.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103147"},"PeriodicalIF":14.7,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143820450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-04-05DOI: 10.1016/j.inffus.2025.103151
Xiaolun Jing, Genke Yang, Jian Chu
{"title":"TC-MGC: Text-conditioned multi-grained contrastive learning for text–video retrieval","authors":"Xiaolun Jing, Genke Yang, Jian Chu","doi":"10.1016/j.inffus.2025.103151","DOIUrl":"10.1016/j.inffus.2025.103151","url":null,"abstract":"<div><div>Motivated by the success of coarse-grained or fine-grained contrast in text–video retrieval, there emerge multi-grained contrastive learning methods which focus on the integration of contrasts with different granularity. However, due to the wider semantic range of videos, the text-agnostic video representations might encode misleading information not described in texts, thus impeding the model from capturing precise cross-modal semantic correspondence. To this end, we propose a Text-Conditioned Multi-Grained Contrast framework, dubbed TC-MGC. Specifically, our model employs a language–video attention block to generate aggregated frame and video representations conditioned on the word’s and text’s attention weights over frames. To filter unnecessary similarity interactions and decrease trainable parameters in the Interactive Similarity Aggregation (ISA) module, we design a Similarity Reorganization (SR) module to identify attentive similarities and reorganize cross-modal similarity vectors and matrices. Next, we argue that the imbalance problem among multi-grained similarities may result in over- and under-representation issues. We thereby introduce an auxiliary Similarity Decorrelation Regularization (SDR) loss to facilitate cooperative relationship utilization by similarity variance minimization on matching text–video pairs. Finally, we present a Linear Softmax Aggregation (LSA) module to explicitly encourage the interactions between multiple similarities and promote the usage of multi-grained information. Empirically, TC-MGC achieves competitive results on multiple text–video retrieval benchmarks, outperforming X-CLIP model by +2.8% (+1.3%), +2.2% (+1.0%), +1.5% (+0.9%) relative (absolute) improvements in text-to-video retrieval R@1 on MSR-VTT, DiDeMo and VATEX, respectively. Our code is publicly available at <span><span>https://github.com/JingXiaolun/TC-MGC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"121 ","pages":"Article 103151"},"PeriodicalIF":14.7,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143807215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-04-05DOI: 10.1016/j.inffus.2025.103165
Habib Khan, Muhammad Talha Usman, JaKeoung Koo
{"title":"Bilateral Feature Fusion with hexagonal attention for robust saliency detection under uncertain environments","authors":"Habib Khan, Muhammad Talha Usman, JaKeoung Koo","doi":"10.1016/j.inffus.2025.103165","DOIUrl":"10.1016/j.inffus.2025.103165","url":null,"abstract":"<div><div>Salient object detection (SOD) aims to accurately highlight and segment the visually prominent objects within complex visual scenes. However, current SOD approaches often lack a comprehensive framework for capturing global-local interactions and feasibly applying appropriate attention at specific feature levels. Furthermore, existing methods face challenges in addressing visual uncertainties, illumination variations, scale differences, and complex environmental conditions. To tackle these research gaps, we propose a novel fusion-centric network that leverages bilateral feature streams and hexagonal attention (BiFusHNet) for robust SOD under visual uncertainties. The proposed network extracts intermediate features using a bilateral fusion strategy based on the Token-To-Token (T2T) transformer for global representation and ConvNeXt for local representation. We present an innovative Hexagonal Attention Framework (HAF) that integrates several attention pipelines for feature refinement. The Edge Attention module significantly enhances granular boundary information through adaptive edge feature learning, while the Non-Local Attention architecture captures long-range dependencies via dynamic relationship modeling. The network includes a novel Quad-Feature Spatial Attention that employs four pooling operations including average, max, median, and variance for thorough spatial comprehension. Additionally, Multi-Head Self-Attention facilitates contextual representation via multi-scale features, while a Weighted Channel Attention method dynamically adjusts the significance of channel-wise features. Our unique cross-fusion technique enables bidirectional information exchange across feature pipelines, allowing extensive feature collaboration for further enhancement. We adopt uncertainty-aware training to improve generalization through the injection of regulated visual variations, facilitating effective generalization across ten types of uncertainties. Comprehensive experiments on diverse datasets employing cross-dataset evaluation strategies demonstrate significant improvements of BiFusHNet over leading approaches in both qualitative and quantitative measures. Code, qualitative results, and trained weights will be available at: <span><span>https://github.com/habib1402/BiFusHNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"121 ","pages":"Article 103165"},"PeriodicalIF":14.7,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143807213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-04-04DOI: 10.1016/j.inffus.2025.103162
Yuhao Sun , Ning Cheng , Shixin Zhang , Wenzhuang Li , Lingyue Yang , Shaowei Cui , Huaping Liu , Fuchun Sun , Jianwei Zhang , Di Guo , Wenjuan Han , Bin Fang
{"title":"Tactile data generation and applications based on visuo-tactile sensors: A review","authors":"Yuhao Sun , Ning Cheng , Shixin Zhang , Wenzhuang Li , Lingyue Yang , Shaowei Cui , Huaping Liu , Fuchun Sun , Jianwei Zhang , Di Guo , Wenjuan Han , Bin Fang","doi":"10.1016/j.inffus.2025.103162","DOIUrl":"10.1016/j.inffus.2025.103162","url":null,"abstract":"<div><div>Tactile sensation is an essential sensory system in humans, providing abilities such as perception and tactile feedback. Due to multisensory information that integrates tactile sensation, humans exhibit remarkable environmental understanding and dexterous manipulation capabilities. Considering the importance of tactile sensing, researchers have developed tactile sensors such as capacitive and piezoresistive types. In recent years, visuo-tactile sensors have won the favor of the research community with abstract tactile information visualization. The Visuo-tactile sensor is an innovative optical sensor that supports image-based tactile information with high resolution compared to electronic tactile sensors, offering new approaches for tactile dataset collection within multimodal datasets. Nevertheless, owing to challenges such as wear resistance, collecting visuo-tactile data remains a high-cost, low-efficiency task, which limits the development of tactile information in multimodal datasets. With the development of the generation methods, visuo-tactile data collection with low efficiency hopes to be solved. Considering the unique contribution of tactile data to multimodal datasets, this review focuses on visuo-tactile data generation. The generation methods for visuo-tactile sensors are categorized into two categories based on simulation approaches: (1) physics-based and (2) learning-based. Additionally, from the perspective of visuo-tactile data, the review summarizes the cutting-edge applications of multimodal datasets incorporating tactile information. Based on this, the challenges and future directions for development are discussed. This review serves as a technical guide for researchers in the field and aims to promote the widespread development and application of multimodal datasets incorporating tactile information.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"121 ","pages":"Article 103162"},"PeriodicalIF":14.7,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143807212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-04-04DOI: 10.1016/j.inffus.2025.103159
Yanbing Yang , Ziwei Liu , Yongkun Chen , Binyu Yan , Yimao Sun , Tao Feng
{"title":"Visible light human activity recognition driven by generative language model","authors":"Yanbing Yang , Ziwei Liu , Yongkun Chen , Binyu Yan , Yimao Sun , Tao Feng","doi":"10.1016/j.inffus.2025.103159","DOIUrl":"10.1016/j.inffus.2025.103159","url":null,"abstract":"<div><div>Visible light-based indoor Human Activity Recognition (HAR) rises as a promising approach, due to its ability to provide indoor illumination, privacy protection, and serve sensing purposes. However, current visible light HAR methods are primarily focused on classification of individual human activity, which falls short of naturally representing and contextual relations. In this paper, we extend the challenge to a cross-modal alignment task between visible light signals and textual descriptions, proposing a framework that leverages generative large language models (LLMs) to decode visible light feature representations into human activity descriptions through sequence-to-sequence modeling. We implement a prototype system of our method and build up a custom dataset. Experiments in real indoor space demonstrate that our method achieves effective natural language level HAR from visible light sensing system, it promotes the information fusion between visible light and natural language, bringing the intelligent physical information systems towards realistic application with the integration of the generative LLMs.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"121 ","pages":"Article 103159"},"PeriodicalIF":14.7,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143807211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-04-04DOI: 10.1016/j.inffus.2025.103156
Min Zhang , Xinmin Song , Ju H. Park , Ben Niu
{"title":"A novel fusion entropy Kalman filter under parallel IMM framework for intermittent observation systems","authors":"Min Zhang , Xinmin Song , Ju H. Park , Ben Niu","doi":"10.1016/j.inffus.2025.103156","DOIUrl":"10.1016/j.inffus.2025.103156","url":null,"abstract":"<div><div>This paper proposes a novel fusion entropy Kalman filter with intermittent observations under the parallel interacting multiple model framework (PIMM-FEIOKF), designed to enhance state estimation in complex scenarios involving intermittent observations, target maneuvers, and non-Gaussian noise. Specifically, the PIMM-FEIOKF employs a fusion entropy method to integrate two interacting multiple model filters with intermittent observations: the maximum correntropy Kalman filter (IMM-MCIOKF) and the minimum error entropy Kalman filter (IMM-MEEIOKF). Both filters rely on the same connectivity matrix that guarantees the conditions for Cholesky decomposition, ensuring the smooth execution of state estimation updates. The PIMM-FEIOKF algorithm runs the two filters in parallel and dynamically selects model probabilities through a transfer probability correction function. This approach achieves a balance between the computational efficiency of IMM-MCIOKF and the high precision of IMM-MEEIOKF. Furthermore, it leverages both current and past model information to improve estimation performance. Simulation results demonstrate that the proposed PIMM-FEIOKF enhances position and velocity accuracy by 12.2% and 7.4%, respectively, compared to the advanced IMM-MEEIOKF. These findings underscore the robustness and efficiency of PIMM-FEIOKF in addressing challenging scenarios, showcasing its superiority over traditional methods.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"121 ","pages":"Article 103156"},"PeriodicalIF":14.7,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143768938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A GAN enhanced meta-deep reinforcement learning approach for DCN routing optimization","authors":"Qing Guo , Wei Zhao , Zhuoheng Lyu , Tingting Zhao","doi":"10.1016/j.inffus.2025.103160","DOIUrl":"10.1016/j.inffus.2025.103160","url":null,"abstract":"<div><div>In large-scale data center networks (DCN), dynamic changes in traffic and topology lead to traffic patterns following a long-tail distribution (meaning a small number of traffic samples appear frequently, while most appear infrequently). This results in clear Non-IID (non-independent and identically distributed) characteristics, posing a serious challenge for traffic routing optimization. Traditional deep reinforcement learning (DRL) methods often assume relatively stable data distributions, while approaches that combine DRL with meta-learning assume only small shifts in traffic or the environment. However, such methods struggle to cope with the scarcity of samples and cross-scenario feature-space shifts caused by long-tail distributions. To address this issue, this paper proposes a global intelligent routing optimization scheme based on meta-reinforcement learning (Meta-DRL) enhanced by generative adversarial networks (GAN). First, a two-level nested Meta-DRL model is built. The lower level focuses on specific task policy optimization, while the upper level learns generalized global network parameters, improving the model’s initialization quality and generalization in unfamiliar network settings. Next, we introduce an innovative mechanism that combines GAN-based feature encoding with a meta-learning discriminator, refining the GAN’s feature discrimination boundary and greatly enhancing the quality of synthesized samples, especially where data is scarce. Furthermore, a parameter feature space mapping mechanism is proposed to unify features from new and old tasks into a shared representation space, avoiding repeated Meta-DRL training when the network environment changes. This substantially boosts the model’s generalization and decision-making efficiency. Simulation results show that, in terms of convergence speed, accumulated rewards, and network routing performance, the GAN-enhanced Meta-DRL method significantly outperforms traditional DRL approaches and other meta-learning methods.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"121 ","pages":"Article 103160"},"PeriodicalIF":14.7,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143799691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}