Information Fusion最新文献

筛选
英文 中文
Joint content-aware and difference-transform lightweight network for remote sensing images semantic change detection 面向遥感图像语义变化检测的联合内容感知和差分变换轻量级网络
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-13 DOI: 10.1016/j.inffus.2025.103276
Jindou Zhang , Ruiqian Zhang , Xiao Huang , Zhizheng Zhang , Bowen Cai , Xianwei Lv , Zhenfeng Shao , Deren Li
{"title":"Joint content-aware and difference-transform lightweight network for remote sensing images semantic change detection","authors":"Jindou Zhang ,&nbsp;Ruiqian Zhang ,&nbsp;Xiao Huang ,&nbsp;Zhizheng Zhang ,&nbsp;Bowen Cai ,&nbsp;Xianwei Lv ,&nbsp;Zhenfeng Shao ,&nbsp;Deren Li","doi":"10.1016/j.inffus.2025.103276","DOIUrl":"10.1016/j.inffus.2025.103276","url":null,"abstract":"<div><div>Advancements in Earth observation technology have enabled effective monitoring of complex surface changes. Semantic change detection (SCD) using high-resolution remote sensing images is crucial for urban planning and environmental monitoring. However, existing deep learning-based SCD methods, which combine semantic segmentation (SS) and binary change detection (BCD), face challenges in lightweight design and consistency between semantic and change results, limiting their accuracy and applicability. To overcome these limitations, we propose the Joint Content-Aware and Difference-Transform Lightweight Network (CDLNet). CDLNet features a lightweight architecture, skip connections, and a multi-task decoding mechanism. The Temporal-Spatial Content-Aware Fusion module (TSAF) in the SS decoding branch incorporates change information to improve semantic classification accuracy within change regions. The Multi-Type Temporal Difference-Transform module (MTDT) in the BCD decoding branch enhances change localization for accurate SCD through efficient transformation of temporal difference features. Experiments on the SECOND, HiUCD mini, MSSCD, and Landsat-SCD datasets demonstrate that CDLNet outperforms thirteen state-of-the-art methods, achieving average improvements of 1.41%, 1.53% and 1.49% in the <span><math><mrow><mi>F</mi><msub><mrow><mn>1</mn></mrow><mrow><mi>s</mi><mi>c</mi><mi>d</mi></mrow></msub></mrow></math></span>, <span><math><mrow><mi>I</mi><mi>o</mi><mi>U</mi><mi>c</mi></mrow></math></span> and <span><math><mrow><mi>S</mi><mi>c</mi><mi>o</mi><mi>r</mi><mi>e</mi></mrow></math></span> metrics, respectively. Ablation studies confirm the effectiveness of the TSAF and MTDT modules and the rationality of multi-task loss weight configuration. Furthermore, CDLNet utilizes only 20% of the parameters (12.88M) and 7.5% of the FLOPs (30.11G) of the leading model, achieving an inference speed of 41 FPS, which underscores its superior lightweight characteristics. The results indicate that CDLNet offers excellent detection performance, generalization, and robustness within a lightweight framework. The code of our paper is accessible at: <span><span>https://github.com/zjd1836/CDLNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103276"},"PeriodicalIF":14.7,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144068227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A self-supervised data augmentation strategy for EEG-based emotion recognition 基于脑电图的情感识别自监督数据增强策略
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-12 DOI: 10.1016/j.inffus.2025.103279
Yingxiao Qiao, Qian Zhao
{"title":"A self-supervised data augmentation strategy for EEG-based emotion recognition","authors":"Yingxiao Qiao,&nbsp;Qian Zhao","doi":"10.1016/j.inffus.2025.103279","DOIUrl":"10.1016/j.inffus.2025.103279","url":null,"abstract":"<div><div>Due to the scarcity problem of electroencephalogram (EEG) data, building high-precision emotion recognition models using deep learning faces great challenges. In recent years, data augmentation has significantly enhanced deep learning performance. Therefore, this paper proposed an innovative self-supervised data augmentation strategy, named SSDAS-EER, to generate high-quality and various artificial EEG feature maps. Firstly, EEG feature maps were constructed by combining differential entropy (DE) and power spectral density (PSD) features to obtain rich spatial and spectral information. Secondly, a masking strategy was used to mask part of the EEG feature maps, which prompted the designed generative adversarial network (GAN) to focus on learning the unmasked feature information and effectively filled in the masked parts. Meanwhile, the elaborated GAN could accurately capture the distribution characteristics of spatial and spectral information, thus ensuring the quality of the generated artificial EEG feature maps. In particular, this paper introduced a self-supervised learning mechanism to further optimize the designed classifier with good generalization ability to the generated samples. This strategy integrated data augmentation and model training into an end-to-end pipeline capable of augmenting EEG data for each subject. In this study, a systematic experiment was conducted on the DEAP dataset, and the results showed that the proposed method achieved an average accuracy of 97.27% and 97.45% on all subjects in valence and arousal, respectively, which was 1.46% and 1.39% higher compared to the time before the strategy was applied. Simultaneously, the similarity between the generated EEG feature maps and the original EEG feature maps was verified. These results indicated that SSDAS-EER had significant performance improvement in EEG emotion recognition tasks, demonstrating its great potential in improving the efficiency of EEG data utilization and emotion recognition accuracy.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103279"},"PeriodicalIF":14.7,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144084094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FSVS-Net: A few-shot semi-supervised vessel segmentation network for multiple organs based on feature distillation and bidirectional weighted fusion FSVS-Net:一种基于特征蒸馏和双向加权融合的多器官半监督血管分割网络
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-12 DOI: 10.1016/j.inffus.2025.103281
Yuqun Yang , Jichen Xu , Mengyuan Xu , Xu Tang , Bo Wang , Kechen Shu , Zheng You
{"title":"FSVS-Net: A few-shot semi-supervised vessel segmentation network for multiple organs based on feature distillation and bidirectional weighted fusion","authors":"Yuqun Yang ,&nbsp;Jichen Xu ,&nbsp;Mengyuan Xu ,&nbsp;Xu Tang ,&nbsp;Bo Wang ,&nbsp;Kechen Shu ,&nbsp;Zheng You","doi":"10.1016/j.inffus.2025.103281","DOIUrl":"10.1016/j.inffus.2025.103281","url":null,"abstract":"<div><div>Accurate 3D vessel mapping is essential for surgical planning and interventional treatments. However, the conventional manual slice-by-slice annotation in CT scans is extremely time-consuming, due to the complexity of vessels: sparse distribution, intricate 3D topology, varying sizes, irregular shapes, and low contrast with the background. To address this problem, we propose a few-shot semi-supervised vessel segmentation network (FSVS-Net) applicable to multiple organs. It can leverage a few annotated slices to segment vessel regions in unannotated slices, enabling efficient semi-supervised processing of the entire CT sequences. Specifically, we propose a feature distillation module for FSVS-Net to enhance vessel-specific semantic representations and suppress irrelevant background features. In addition, we design a bidirectional weighted fusion strategy that propagates information from a few annotated slices to unannotated ones in both opposite directions of the CT sequence, effectively modeling 3D vessel continuity and reducing error accumulation. Extensive experiments on three datasets (hepatic vessels, pulmonary vessels, and renal arteries) demonstrate that FSVS-Net achieves state-of-the-art performance in few-shot vessel segmentation task, significantly outperforming existing methods. We collected and annotated three vessel datasets, including clinical data from Tsinghua Changgung Hospital and public sources (e.g., MSD08), for this study. In practice, it reduces the average annotation time from 2 h to 0.5 h per volume, improving efficiency by 4<span><math><mo>×</mo></math></span>. We release three organ-specific vessel datasets and the implementation code at: <span><span>https://github.com/YqunYang/FSVS-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103281"},"PeriodicalIF":14.7,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144071140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
General pre-trained inertial signal feature extraction based on temporal memory fusion 基于时间记忆融合的通用预训练惯性信号特征提取
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-11 DOI: 10.1016/j.inffus.2025.103274
Yifeng Wang, Yi Zhao
{"title":"General pre-trained inertial signal feature extraction based on temporal memory fusion","authors":"Yifeng Wang,&nbsp;Yi Zhao","doi":"10.1016/j.inffus.2025.103274","DOIUrl":"10.1016/j.inffus.2025.103274","url":null,"abstract":"<div><div>Inertial sensors are widely used in smartphones, robotics, wearables, aerospace systems, and industrial automation. However, extracting universal features from inertial signals remains challenging. Inertial signal features are encoded in abstract, unreadable waveforms, lacking the visual intuitiveness of images, which makes semantic extraction difficult. The non-stationary nature and complex motion patterns further complicate the feature extraction process. Moreover, the lack of large-scale annotated inertial datasets limits deep learning models to learn universal features and generalize them across expansive applications of inertial sensors. To this end, we propose a Topology Guided Feature Extraction (TG-FE) approach for general inertial signal feature extraction. TG-FE fuses time-series information into graph representations, constructing a Memory Graph by emulating the complex network characteristics of human memory. Guided by small-world network principles, this graph integrates local and global information while sparsity constraints emphasize critical feature interactions. The Memory Graph preserves nonlinear relationships and higher-order dependencies, enabling the model to generalize across scenarios with minimal task-specific tuning. Furthermore, a Cross-Graph Feature Fusion mechanism integrates information across stacked TG-FE modules to enhance representation ability and ensure stable gradient flow. With self-supervised pre-training, the TG-FE modules require only minimal fine-tuning to adapt to various hardware configurations and task scenarios, consistently outperforming comparison methods across all evaluations. Compared to the current state-of-the-art method, our TG-FE achieves 11.7% and 20.0% error reduction in attitude and displacement estimation tasks. Notably, TG-FE achieves an order-of-magnitude advantage in stability evaluations, maintaining robust performance even under 20% noise conditions where competing methods degrade significantly. Overall, this work offers a solution for general inertial signal feature extraction and opens new avenues for applying graph-based deep learning to capture and represent sequential signal features.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103274"},"PeriodicalIF":14.7,"publicationDate":"2025-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143936280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memory recall: Retrieval-Augmented mind reconstruction for brain decoding 记忆回忆:用于大脑解码的检索-增强思维重建
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-10 DOI: 10.1016/j.inffus.2025.103280
Yuxiao Zhao , Guohua Dong , Lei Zhu , Xiaomin Ying
{"title":"Memory recall: Retrieval-Augmented mind reconstruction for brain decoding","authors":"Yuxiao Zhao ,&nbsp;Guohua Dong ,&nbsp;Lei Zhu ,&nbsp;Xiaomin Ying","doi":"10.1016/j.inffus.2025.103280","DOIUrl":"10.1016/j.inffus.2025.103280","url":null,"abstract":"<div><div>Reconstructing visual stimuli from functional magnetic resonance imaging (fMRI) is a complex challenge in neuroscience. Most existing approaches rely on mapping neural signals to pretrained models to generate latent variables, which are then used to reconstruct images via a diffusion model. However, this multi-step process can result in the loss of crucial semantic details, limiting reconstruction accuracy. In this paper, we introduce a novel brain decoding framework, called Memory Recall (MR), inspired by bionic brain mechanisms. MR mimics the human visual perception process, where the brain retrieves stored visual experiences to compensate for incomplete visual cues. Initially, low- and high-level visual cues are extracted using spatial mapping techniques based on VAE and CLIP, replicating the brain’s ability to interpret degraded stimuli. A visual experience database is then created to retrieve complementary information that enriches these high-level representations, simulating the brain’s memory retrieval process. Finally, an Attentive Visual Signal Fusion Network (AVSFN) with a novel attention scoring mechanism integrates the retrieved information, enhancing the generative model’s performance and emulating the brain’s refinement of visual perception. Experimental results show that MR outperforms state-of-the-art models across multiple evaluation metrics and subjective assessments. Moreover, our results provide new evidence supporting a well-known psychological conclusion that the basic information capacity of short-term memory is four items, further demonstrating the informativeness and interpretability of our model.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103280"},"PeriodicalIF":14.7,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143936278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised representation learning for geospatial objects: A survey 地理空间对象的自监督表示学习:综述
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-09 DOI: 10.1016/j.inffus.2025.103265
Yile Chen , Weiming Huang , Kaiqi Zhao , Yue Jiang , Gao Cong
{"title":"Self-supervised representation learning for geospatial objects: A survey","authors":"Yile Chen ,&nbsp;Weiming Huang ,&nbsp;Kaiqi Zhao ,&nbsp;Yue Jiang ,&nbsp;Gao Cong","doi":"10.1016/j.inffus.2025.103265","DOIUrl":"10.1016/j.inffus.2025.103265","url":null,"abstract":"<div><div>The proliferation of various data sources in urban and territorial environments has significantly facilitated the development of geospatial artificial intelligence (GeoAI) across a wide range of geospatial applications. However, geospatial data, which is inherently linked to geospatial objects, often exhibits data heterogeneity that necessitates specialized fusion and representation strategies while simultaneously being inherently sparse in labels for downstream tasks. Consequently, there is a growing demand for techniques that can effectively leverage geospatial data without heavy reliance on task-specific labels and model designs. This need aligns with the principles of self-supervised learning (SSL), which has garnered increasing attention for its ability to learn effective and generalizable representations directly from data without extensive labeled supervision. This paper presents a comprehensive and up-to-date survey of SSL techniques specifically applied to or developed for geospatial objects in three primary vector geometric types: <em>Point</em>, <em>Polyline</em>, and <em>Polygon</em>. We systematically categorize various SSL techniques into predictive and contrastive methods, and analyze their adaptation to different data types for representation learning across various downstream tasks. Furthermore, we examine the emerging trends in SSL for geospatial objects, particularly the gradual advancements towards geospatial foundation models. Finally, we discuss key challenges in current research and outline promising directions for future investigation. By offering a structured analysis of existing studies, this paper aims to inspire continued progress in integrating SSL with geospatial objects, and the development of geospatial foundation models in a longer term.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103265"},"PeriodicalIF":14.7,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143936277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A lightweight hierarchical feature fusion network for surgical instrument segmentation in internet of medical things 面向医疗物联网手术器械分割的轻量级分层特征融合网络
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-09 DOI: 10.1016/j.inffus.2025.103303
Tahir Mahmood, Ganbayar Batchuluun, Seung Gu Kim, Jung Soo Kim, Kang Ryoung Park
{"title":"A lightweight hierarchical feature fusion network for surgical instrument segmentation in internet of medical things","authors":"Tahir Mahmood,&nbsp;Ganbayar Batchuluun,&nbsp;Seung Gu Kim,&nbsp;Jung Soo Kim,&nbsp;Kang Ryoung Park","doi":"10.1016/j.inffus.2025.103303","DOIUrl":"10.1016/j.inffus.2025.103303","url":null,"abstract":"<div><div>Minimally invasive surgeries (MIS) enhance patient outcomes but pose challenges such as limited visibility, complex hand-eye coordination, and manual endoscope control. The rise of the Internet of Medical Things (IoMT) and telesurgery further demands efficient and lightweight solutions. To address these limitations, we propose a novel lightweight hierarchical feature fusion network (LHFF-Net) for surgical instrument segmentation. LHFF-Net integrates high-, mid-, and low-level encoder features through three novel modules: the multiscale feature aggregation (MFA) module which can capture fine-grained and coarse features across scales, the enhanced spatial attention (ESA) module, prioritizing critical spatial regions, and the enhanced edge module (EEM), refining boundary delineation.</div><div>The proposed model was evaluated on two benchmark datasets, Kvasir-Instrument and UW-Sinus-Surgery, achieving mean Dice coefficients (mDC) of 97.87 % and 88.83 %, respectively, along with mean intersection over union (mIOU) scores of 95.87 % and 84.33 %. These results highlight LHFF-Net’s ability to deliver high segmentation accuracy while maintaining computational efficiency with only 2.2 million parameters. This combination of performance and efficiency makes LHFF-Net a robust solution for IoMT applications, enabling real-time telesurgery and driving innovations in healthcare.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103303"},"PeriodicalIF":14.7,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143936279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised learning of invariant causal representation in heterogeneous information network 异构信息网络中不变因果表示的自监督学习
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-09 DOI: 10.1016/j.inffus.2025.103246
Pei Zhang , Lihua Zhou , Yong Li , Hongmei Chen , Lizhen Wang
{"title":"Self-supervised learning of invariant causal representation in heterogeneous information network","authors":"Pei Zhang ,&nbsp;Lihua Zhou ,&nbsp;Yong Li ,&nbsp;Hongmei Chen ,&nbsp;Lizhen Wang","doi":"10.1016/j.inffus.2025.103246","DOIUrl":"10.1016/j.inffus.2025.103246","url":null,"abstract":"<div><div>Invariant learning on graphs is essential for uncovering causal relationships in complex phenomena. However, most research has focused on homogeneous information networks with single node and edge types, ignoring the rich heterogeneity of real-world systems. Additionally, many invariant learning methods rely on labeled data and the design of complex graph augmentation or contrastive sampling algorithms, requiring domain-specific expertise or substantial human resources, making them difficult to implement in practical applications. To overcome these limitations, we propose a <strong>G</strong>enerative-<strong>C</strong>ontrastive <strong>C</strong>ollaborative <strong>S</strong>elf-Supervised Learning (GCCS) framework. This framework combines the ability of generative learning to mine supervisory signals from the data itself with the capacity of contrastive learning to learn invariant representations, enabling self-supervised learning of invariant causal representations from heterogeneous information networks (HINs). Specifically, generative self-supervised learning (SSL) constructs meta-path aware adjacency matrices and performs a mask-reconstruct operation, while contrastive SSL refines the learned representations by enforcing similarity and consensus constraints across different views. This joint optimization captures invariant causal features, enhancing the model’s robustness. Extensive experiments on three real-world HINs datasets demonstrate that GCCS outperforms state-of-the-art baselines, particularly in noisy and complex environments, showcasing its superior performance and robustness for self-supervised learning in heterogeneous graph structures.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103246"},"PeriodicalIF":14.7,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143935532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “Multi-stage multimodal fusion network with language models and uncertainty evaluation for early risk stratification in rheumatic and musculoskeletal diseases” [Information Fusion, Volume 120 (2025) 103068] “风湿病和肌肉骨骼疾病早期风险分层的语言模型和不确定性评估的多阶段多模式融合网络”的勘误表[信息融合,卷120 (2025)103068]
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-08 DOI: 10.1016/j.inffus.2025.103287
Bing Wang, Weizi Li, Anthony Bradlow, Archie Watt, Antoni T.Y. Chan, Eghosa Bazuaye
{"title":"Corrigendum to “Multi-stage multimodal fusion network with language models and uncertainty evaluation for early risk stratification in rheumatic and musculoskeletal diseases” [Information Fusion, Volume 120 (2025) 103068]","authors":"Bing Wang,&nbsp;Weizi Li,&nbsp;Anthony Bradlow,&nbsp;Archie Watt,&nbsp;Antoni T.Y. Chan,&nbsp;Eghosa Bazuaye","doi":"10.1016/j.inffus.2025.103287","DOIUrl":"10.1016/j.inffus.2025.103287","url":null,"abstract":"","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103287"},"PeriodicalIF":14.7,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143946692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Motion-guided token prioritization and semantic degradation fusion for exo-to-ego cross-view video generation 运动引导下的令牌优先级和语义退化融合的自外交叉视点视频生成
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-08 DOI: 10.1016/j.inffus.2025.103273
Weipeng Hu , Jiun Tian Hoe , Runzhong Zhang , Yiming Yang , Haifeng Hu , Yap-Peng Tan
{"title":"Motion-guided token prioritization and semantic degradation fusion for exo-to-ego cross-view video generation","authors":"Weipeng Hu ,&nbsp;Jiun Tian Hoe ,&nbsp;Runzhong Zhang ,&nbsp;Yiming Yang ,&nbsp;Haifeng Hu ,&nbsp;Yap-Peng Tan","doi":"10.1016/j.inffus.2025.103273","DOIUrl":"10.1016/j.inffus.2025.103273","url":null,"abstract":"<div><div>Exocentric (third-person) to egocentric (first-person) cross-view video generation aims to synthesize the egocentric view of a video from an exocentric view. However, current techniques either use a sub-optimal image-based approach that ignores temporal information, or require target-view cues that limits application flexibility. In this paper, we tackle the challenging cue-free Exocentric-to-Egocentric Video Generation (E2VG) problem via a video-based method, called motion-guided Token Prioritization and semantic Degradation Fusion (TPDF). Taking into account motion cues can provide useful overlapping trails between the two views by tracking the movement of human and the interesting objects, the proposed motion-guided token prioritization incorporates motion cues to adaptively distinguish between informative and uninformative tokens. Specifically, Our design of the Motion-guided Spatial token Prioritization Transformer (MSPT) and the Motion-guided Temporal token Prioritization Transformer (MTPT) incorporates motion cues to adaptively identify patches/tokens as informative or uninformative with orthogonal constraints, ensuring accurate attention retrieval and spatial–temporal consistency in cross-view generation. Additionally, we present a Semantic Degradation Fusion (SDF) to progressively learn egocentric semantics through a degradation learning mechanism, enabling our model to infer egocentric-view content. By extending into a cascaded fashion, the Cascaded token Prioritization and Degradation fusion (CPD) enhances attention learning with informative tokens and fuses egocentric semantic at different levels of granularity. Extensive experiments demonstrate that our method is quantitatively and qualitatively superior to the state-of-the-art approaches.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103273"},"PeriodicalIF":14.7,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143941125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信