Information FusionPub Date : 2025-05-13DOI: 10.1016/j.inffus.2025.103276
Jindou Zhang , Ruiqian Zhang , Xiao Huang , Zhizheng Zhang , Bowen Cai , Xianwei Lv , Zhenfeng Shao , Deren Li
{"title":"Joint content-aware and difference-transform lightweight network for remote sensing images semantic change detection","authors":"Jindou Zhang , Ruiqian Zhang , Xiao Huang , Zhizheng Zhang , Bowen Cai , Xianwei Lv , Zhenfeng Shao , Deren Li","doi":"10.1016/j.inffus.2025.103276","DOIUrl":"10.1016/j.inffus.2025.103276","url":null,"abstract":"<div><div>Advancements in Earth observation technology have enabled effective monitoring of complex surface changes. Semantic change detection (SCD) using high-resolution remote sensing images is crucial for urban planning and environmental monitoring. However, existing deep learning-based SCD methods, which combine semantic segmentation (SS) and binary change detection (BCD), face challenges in lightweight design and consistency between semantic and change results, limiting their accuracy and applicability. To overcome these limitations, we propose the Joint Content-Aware and Difference-Transform Lightweight Network (CDLNet). CDLNet features a lightweight architecture, skip connections, and a multi-task decoding mechanism. The Temporal-Spatial Content-Aware Fusion module (TSAF) in the SS decoding branch incorporates change information to improve semantic classification accuracy within change regions. The Multi-Type Temporal Difference-Transform module (MTDT) in the BCD decoding branch enhances change localization for accurate SCD through efficient transformation of temporal difference features. Experiments on the SECOND, HiUCD mini, MSSCD, and Landsat-SCD datasets demonstrate that CDLNet outperforms thirteen state-of-the-art methods, achieving average improvements of 1.41%, 1.53% and 1.49% in the <span><math><mrow><mi>F</mi><msub><mrow><mn>1</mn></mrow><mrow><mi>s</mi><mi>c</mi><mi>d</mi></mrow></msub></mrow></math></span>, <span><math><mrow><mi>I</mi><mi>o</mi><mi>U</mi><mi>c</mi></mrow></math></span> and <span><math><mrow><mi>S</mi><mi>c</mi><mi>o</mi><mi>r</mi><mi>e</mi></mrow></math></span> metrics, respectively. Ablation studies confirm the effectiveness of the TSAF and MTDT modules and the rationality of multi-task loss weight configuration. Furthermore, CDLNet utilizes only 20% of the parameters (12.88M) and 7.5% of the FLOPs (30.11G) of the leading model, achieving an inference speed of 41 FPS, which underscores its superior lightweight characteristics. The results indicate that CDLNet offers excellent detection performance, generalization, and robustness within a lightweight framework. The code of our paper is accessible at: <span><span>https://github.com/zjd1836/CDLNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103276"},"PeriodicalIF":14.7,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144068227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-05-12DOI: 10.1016/j.inffus.2025.103279
Yingxiao Qiao, Qian Zhao
{"title":"A self-supervised data augmentation strategy for EEG-based emotion recognition","authors":"Yingxiao Qiao, Qian Zhao","doi":"10.1016/j.inffus.2025.103279","DOIUrl":"10.1016/j.inffus.2025.103279","url":null,"abstract":"<div><div>Due to the scarcity problem of electroencephalogram (EEG) data, building high-precision emotion recognition models using deep learning faces great challenges. In recent years, data augmentation has significantly enhanced deep learning performance. Therefore, this paper proposed an innovative self-supervised data augmentation strategy, named SSDAS-EER, to generate high-quality and various artificial EEG feature maps. Firstly, EEG feature maps were constructed by combining differential entropy (DE) and power spectral density (PSD) features to obtain rich spatial and spectral information. Secondly, a masking strategy was used to mask part of the EEG feature maps, which prompted the designed generative adversarial network (GAN) to focus on learning the unmasked feature information and effectively filled in the masked parts. Meanwhile, the elaborated GAN could accurately capture the distribution characteristics of spatial and spectral information, thus ensuring the quality of the generated artificial EEG feature maps. In particular, this paper introduced a self-supervised learning mechanism to further optimize the designed classifier with good generalization ability to the generated samples. This strategy integrated data augmentation and model training into an end-to-end pipeline capable of augmenting EEG data for each subject. In this study, a systematic experiment was conducted on the DEAP dataset, and the results showed that the proposed method achieved an average accuracy of 97.27% and 97.45% on all subjects in valence and arousal, respectively, which was 1.46% and 1.39% higher compared to the time before the strategy was applied. Simultaneously, the similarity between the generated EEG feature maps and the original EEG feature maps was verified. These results indicated that SSDAS-EER had significant performance improvement in EEG emotion recognition tasks, demonstrating its great potential in improving the efficiency of EEG data utilization and emotion recognition accuracy.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103279"},"PeriodicalIF":14.7,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144084094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-05-12DOI: 10.1016/j.inffus.2025.103281
Yuqun Yang , Jichen Xu , Mengyuan Xu , Xu Tang , Bo Wang , Kechen Shu , Zheng You
{"title":"FSVS-Net: A few-shot semi-supervised vessel segmentation network for multiple organs based on feature distillation and bidirectional weighted fusion","authors":"Yuqun Yang , Jichen Xu , Mengyuan Xu , Xu Tang , Bo Wang , Kechen Shu , Zheng You","doi":"10.1016/j.inffus.2025.103281","DOIUrl":"10.1016/j.inffus.2025.103281","url":null,"abstract":"<div><div>Accurate 3D vessel mapping is essential for surgical planning and interventional treatments. However, the conventional manual slice-by-slice annotation in CT scans is extremely time-consuming, due to the complexity of vessels: sparse distribution, intricate 3D topology, varying sizes, irregular shapes, and low contrast with the background. To address this problem, we propose a few-shot semi-supervised vessel segmentation network (FSVS-Net) applicable to multiple organs. It can leverage a few annotated slices to segment vessel regions in unannotated slices, enabling efficient semi-supervised processing of the entire CT sequences. Specifically, we propose a feature distillation module for FSVS-Net to enhance vessel-specific semantic representations and suppress irrelevant background features. In addition, we design a bidirectional weighted fusion strategy that propagates information from a few annotated slices to unannotated ones in both opposite directions of the CT sequence, effectively modeling 3D vessel continuity and reducing error accumulation. Extensive experiments on three datasets (hepatic vessels, pulmonary vessels, and renal arteries) demonstrate that FSVS-Net achieves state-of-the-art performance in few-shot vessel segmentation task, significantly outperforming existing methods. We collected and annotated three vessel datasets, including clinical data from Tsinghua Changgung Hospital and public sources (e.g., MSD08), for this study. In practice, it reduces the average annotation time from 2 h to 0.5 h per volume, improving efficiency by 4<span><math><mo>×</mo></math></span>. We release three organ-specific vessel datasets and the implementation code at: <span><span>https://github.com/YqunYang/FSVS-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103281"},"PeriodicalIF":14.7,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144071140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-05-11DOI: 10.1016/j.inffus.2025.103274
Yifeng Wang, Yi Zhao
{"title":"General pre-trained inertial signal feature extraction based on temporal memory fusion","authors":"Yifeng Wang, Yi Zhao","doi":"10.1016/j.inffus.2025.103274","DOIUrl":"10.1016/j.inffus.2025.103274","url":null,"abstract":"<div><div>Inertial sensors are widely used in smartphones, robotics, wearables, aerospace systems, and industrial automation. However, extracting universal features from inertial signals remains challenging. Inertial signal features are encoded in abstract, unreadable waveforms, lacking the visual intuitiveness of images, which makes semantic extraction difficult. The non-stationary nature and complex motion patterns further complicate the feature extraction process. Moreover, the lack of large-scale annotated inertial datasets limits deep learning models to learn universal features and generalize them across expansive applications of inertial sensors. To this end, we propose a Topology Guided Feature Extraction (TG-FE) approach for general inertial signal feature extraction. TG-FE fuses time-series information into graph representations, constructing a Memory Graph by emulating the complex network characteristics of human memory. Guided by small-world network principles, this graph integrates local and global information while sparsity constraints emphasize critical feature interactions. The Memory Graph preserves nonlinear relationships and higher-order dependencies, enabling the model to generalize across scenarios with minimal task-specific tuning. Furthermore, a Cross-Graph Feature Fusion mechanism integrates information across stacked TG-FE modules to enhance representation ability and ensure stable gradient flow. With self-supervised pre-training, the TG-FE modules require only minimal fine-tuning to adapt to various hardware configurations and task scenarios, consistently outperforming comparison methods across all evaluations. Compared to the current state-of-the-art method, our TG-FE achieves 11.7% and 20.0% error reduction in attitude and displacement estimation tasks. Notably, TG-FE achieves an order-of-magnitude advantage in stability evaluations, maintaining robust performance even under 20% noise conditions where competing methods degrade significantly. Overall, this work offers a solution for general inertial signal feature extraction and opens new avenues for applying graph-based deep learning to capture and represent sequential signal features.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103274"},"PeriodicalIF":14.7,"publicationDate":"2025-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143936280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memory recall: Retrieval-Augmented mind reconstruction for brain decoding","authors":"Yuxiao Zhao , Guohua Dong , Lei Zhu , Xiaomin Ying","doi":"10.1016/j.inffus.2025.103280","DOIUrl":"10.1016/j.inffus.2025.103280","url":null,"abstract":"<div><div>Reconstructing visual stimuli from functional magnetic resonance imaging (fMRI) is a complex challenge in neuroscience. Most existing approaches rely on mapping neural signals to pretrained models to generate latent variables, which are then used to reconstruct images via a diffusion model. However, this multi-step process can result in the loss of crucial semantic details, limiting reconstruction accuracy. In this paper, we introduce a novel brain decoding framework, called Memory Recall (MR), inspired by bionic brain mechanisms. MR mimics the human visual perception process, where the brain retrieves stored visual experiences to compensate for incomplete visual cues. Initially, low- and high-level visual cues are extracted using spatial mapping techniques based on VAE and CLIP, replicating the brain’s ability to interpret degraded stimuli. A visual experience database is then created to retrieve complementary information that enriches these high-level representations, simulating the brain’s memory retrieval process. Finally, an Attentive Visual Signal Fusion Network (AVSFN) with a novel attention scoring mechanism integrates the retrieved information, enhancing the generative model’s performance and emulating the brain’s refinement of visual perception. Experimental results show that MR outperforms state-of-the-art models across multiple evaluation metrics and subjective assessments. Moreover, our results provide new evidence supporting a well-known psychological conclusion that the basic information capacity of short-term memory is four items, further demonstrating the informativeness and interpretability of our model.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103280"},"PeriodicalIF":14.7,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143936278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-supervised representation learning for geospatial objects: A survey","authors":"Yile Chen , Weiming Huang , Kaiqi Zhao , Yue Jiang , Gao Cong","doi":"10.1016/j.inffus.2025.103265","DOIUrl":"10.1016/j.inffus.2025.103265","url":null,"abstract":"<div><div>The proliferation of various data sources in urban and territorial environments has significantly facilitated the development of geospatial artificial intelligence (GeoAI) across a wide range of geospatial applications. However, geospatial data, which is inherently linked to geospatial objects, often exhibits data heterogeneity that necessitates specialized fusion and representation strategies while simultaneously being inherently sparse in labels for downstream tasks. Consequently, there is a growing demand for techniques that can effectively leverage geospatial data without heavy reliance on task-specific labels and model designs. This need aligns with the principles of self-supervised learning (SSL), which has garnered increasing attention for its ability to learn effective and generalizable representations directly from data without extensive labeled supervision. This paper presents a comprehensive and up-to-date survey of SSL techniques specifically applied to or developed for geospatial objects in three primary vector geometric types: <em>Point</em>, <em>Polyline</em>, and <em>Polygon</em>. We systematically categorize various SSL techniques into predictive and contrastive methods, and analyze their adaptation to different data types for representation learning across various downstream tasks. Furthermore, we examine the emerging trends in SSL for geospatial objects, particularly the gradual advancements towards geospatial foundation models. Finally, we discuss key challenges in current research and outline promising directions for future investigation. By offering a structured analysis of existing studies, this paper aims to inspire continued progress in integrating SSL with geospatial objects, and the development of geospatial foundation models in a longer term.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103265"},"PeriodicalIF":14.7,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143936277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-05-09DOI: 10.1016/j.inffus.2025.103303
Tahir Mahmood, Ganbayar Batchuluun, Seung Gu Kim, Jung Soo Kim, Kang Ryoung Park
{"title":"A lightweight hierarchical feature fusion network for surgical instrument segmentation in internet of medical things","authors":"Tahir Mahmood, Ganbayar Batchuluun, Seung Gu Kim, Jung Soo Kim, Kang Ryoung Park","doi":"10.1016/j.inffus.2025.103303","DOIUrl":"10.1016/j.inffus.2025.103303","url":null,"abstract":"<div><div>Minimally invasive surgeries (MIS) enhance patient outcomes but pose challenges such as limited visibility, complex hand-eye coordination, and manual endoscope control. The rise of the Internet of Medical Things (IoMT) and telesurgery further demands efficient and lightweight solutions. To address these limitations, we propose a novel lightweight hierarchical feature fusion network (LHFF-Net) for surgical instrument segmentation. LHFF-Net integrates high-, mid-, and low-level encoder features through three novel modules: the multiscale feature aggregation (MFA) module which can capture fine-grained and coarse features across scales, the enhanced spatial attention (ESA) module, prioritizing critical spatial regions, and the enhanced edge module (EEM), refining boundary delineation.</div><div>The proposed model was evaluated on two benchmark datasets, Kvasir-Instrument and UW-Sinus-Surgery, achieving mean Dice coefficients (mDC) of 97.87 % and 88.83 %, respectively, along with mean intersection over union (mIOU) scores of 95.87 % and 84.33 %. These results highlight LHFF-Net’s ability to deliver high segmentation accuracy while maintaining computational efficiency with only 2.2 million parameters. This combination of performance and efficiency makes LHFF-Net a robust solution for IoMT applications, enabling real-time telesurgery and driving innovations in healthcare.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103303"},"PeriodicalIF":14.7,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143936279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-05-09DOI: 10.1016/j.inffus.2025.103246
Pei Zhang , Lihua Zhou , Yong Li , Hongmei Chen , Lizhen Wang
{"title":"Self-supervised learning of invariant causal representation in heterogeneous information network","authors":"Pei Zhang , Lihua Zhou , Yong Li , Hongmei Chen , Lizhen Wang","doi":"10.1016/j.inffus.2025.103246","DOIUrl":"10.1016/j.inffus.2025.103246","url":null,"abstract":"<div><div>Invariant learning on graphs is essential for uncovering causal relationships in complex phenomena. However, most research has focused on homogeneous information networks with single node and edge types, ignoring the rich heterogeneity of real-world systems. Additionally, many invariant learning methods rely on labeled data and the design of complex graph augmentation or contrastive sampling algorithms, requiring domain-specific expertise or substantial human resources, making them difficult to implement in practical applications. To overcome these limitations, we propose a <strong>G</strong>enerative-<strong>C</strong>ontrastive <strong>C</strong>ollaborative <strong>S</strong>elf-Supervised Learning (GCCS) framework. This framework combines the ability of generative learning to mine supervisory signals from the data itself with the capacity of contrastive learning to learn invariant representations, enabling self-supervised learning of invariant causal representations from heterogeneous information networks (HINs). Specifically, generative self-supervised learning (SSL) constructs meta-path aware adjacency matrices and performs a mask-reconstruct operation, while contrastive SSL refines the learned representations by enforcing similarity and consensus constraints across different views. This joint optimization captures invariant causal features, enhancing the model’s robustness. Extensive experiments on three real-world HINs datasets demonstrate that GCCS outperforms state-of-the-art baselines, particularly in noisy and complex environments, showcasing its superior performance and robustness for self-supervised learning in heterogeneous graph structures.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103246"},"PeriodicalIF":14.7,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143935532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-05-08DOI: 10.1016/j.inffus.2025.103287
Bing Wang, Weizi Li, Anthony Bradlow, Archie Watt, Antoni T.Y. Chan, Eghosa Bazuaye
{"title":"Corrigendum to “Multi-stage multimodal fusion network with language models and uncertainty evaluation for early risk stratification in rheumatic and musculoskeletal diseases” [Information Fusion, Volume 120 (2025) 103068]","authors":"Bing Wang, Weizi Li, Anthony Bradlow, Archie Watt, Antoni T.Y. Chan, Eghosa Bazuaye","doi":"10.1016/j.inffus.2025.103287","DOIUrl":"10.1016/j.inffus.2025.103287","url":null,"abstract":"","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103287"},"PeriodicalIF":14.7,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143946692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-05-08DOI: 10.1016/j.inffus.2025.103273
Weipeng Hu , Jiun Tian Hoe , Runzhong Zhang , Yiming Yang , Haifeng Hu , Yap-Peng Tan
{"title":"Motion-guided token prioritization and semantic degradation fusion for exo-to-ego cross-view video generation","authors":"Weipeng Hu , Jiun Tian Hoe , Runzhong Zhang , Yiming Yang , Haifeng Hu , Yap-Peng Tan","doi":"10.1016/j.inffus.2025.103273","DOIUrl":"10.1016/j.inffus.2025.103273","url":null,"abstract":"<div><div>Exocentric (third-person) to egocentric (first-person) cross-view video generation aims to synthesize the egocentric view of a video from an exocentric view. However, current techniques either use a sub-optimal image-based approach that ignores temporal information, or require target-view cues that limits application flexibility. In this paper, we tackle the challenging cue-free Exocentric-to-Egocentric Video Generation (E2VG) problem via a video-based method, called motion-guided Token Prioritization and semantic Degradation Fusion (TPDF). Taking into account motion cues can provide useful overlapping trails between the two views by tracking the movement of human and the interesting objects, the proposed motion-guided token prioritization incorporates motion cues to adaptively distinguish between informative and uninformative tokens. Specifically, Our design of the Motion-guided Spatial token Prioritization Transformer (MSPT) and the Motion-guided Temporal token Prioritization Transformer (MTPT) incorporates motion cues to adaptively identify patches/tokens as informative or uninformative with orthogonal constraints, ensuring accurate attention retrieval and spatial–temporal consistency in cross-view generation. Additionally, we present a Semantic Degradation Fusion (SDF) to progressively learn egocentric semantics through a degradation learning mechanism, enabling our model to infer egocentric-view content. By extending into a cascaded fashion, the Cascaded token Prioritization and Degradation fusion (CPD) enhances attention learning with informative tokens and fuses egocentric semantic at different levels of granularity. Extensive experiments demonstrate that our method is quantitatively and qualitatively superior to the state-of-the-art approaches.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103273"},"PeriodicalIF":14.7,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143941125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}