Information FusionPub Date : 2025-06-21DOI: 10.1016/j.inffus.2025.103409
Yinghao Liu , Jian Li , Chao Lian , Xinyue Zhang , Junhui Gong , Yufan Wang , Yuliang Zhao
{"title":"A survey for wearable sensors empowering smart healthcare in the era of large language models","authors":"Yinghao Liu , Jian Li , Chao Lian , Xinyue Zhang , Junhui Gong , Yufan Wang , Yuliang Zhao","doi":"10.1016/j.inffus.2025.103409","DOIUrl":"10.1016/j.inffus.2025.103409","url":null,"abstract":"<div><div>Large language models (LLMs) have made significant advances in biomedical applications, including medical literature analysis and clinical note summarization. Meanwhile, intelligent wearable sensors have become essential tools for joint motion analysis and disease diagnosis with their high sensitivity, real-time monitoring capabilities, and diverse application scenarios. However, effectively integrating LLMs with wearable sensors to achieve in-depth motion data analysis and intelligent health management remains a major research challenge. Traditional studies have often treated joint motion analysis and disease diagnosis as separate domains. This review provides a comprehensive analysis of wearable sensor classifications, data fusion algorithms, and their representative applications in human posture recognition and disease diagnosis, while further exploring the potential of LLMs in enhancing wearable sensor capabilities. The incorporation of LLMs offers the potential to uncover complex relationships between movement patterns and disease progression, facilitating more accurate health assessments and early interventions. In addressing the challenges associated with multi-source sensor data fusion and real-time processing, LLMs, with their powerful feature extraction and cross-modal learning capabilities, are expected to improve data processing efficiency and enable more intelligent real-time diagnostics and decision support. Additionally, energy consumption and computational load remain critical bottlenecks limiting the long-term deployment of wearable devices. Integrating self-powered sensors presents a promising avenue for enhancing data processing efficiency. This review summarizes key challenges in current technological developments and envisions the future convergence of LLMs and wearable sensors, aiming to drive the advancement of intelligent healthcare and health monitoring.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103409"},"PeriodicalIF":14.7,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144337824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-06-21DOI: 10.1016/j.inffus.2025.103405
Mingying Xu , Kui Peng , Jie Liu , Qing Zhang , Linqi Song , Yinqiao Li
{"title":"Multimodal Named Entity Recognition based on topic prompt and multi-curriculum denoising","authors":"Mingying Xu , Kui Peng , Jie Liu , Qing Zhang , Linqi Song , Yinqiao Li","doi":"10.1016/j.inffus.2025.103405","DOIUrl":"10.1016/j.inffus.2025.103405","url":null,"abstract":"<div><div>The rapid development of Generative Large Models (GLMs) such as ChatGPT, GPT4 have significantly enhanced their ability to handle complex tasks and drive innovation across multiple fields, especially in social media field. However, GLMs are prone to generate “hallucinated” content when dealing with ambiguous problems lacking clear evidence, which undermines their reliability. Multimodal Named Entity Recognition (MNER) addresses this issue by integrating image, text and contextual information to establish a fact-based framework, thereby reducing the risk of hallucination and strengthening the reasoning foundation of GLMs. The combination of GLMs and MNER merges the flexibility of content generation with evidence-based constraints, thereby improving reliability and interpretability. In MNER task, weakly related or irrelevant image information introduces noise, which degrades MNER performance. In this paper, we propose a novel framework TPMCLNet, which combines topic prompt with a multi-curriculum denoising strategy. First, the topic prompt module extracts topic information from the images and integrates this image-derived information with the text as auxiliary input, thereby enhancing the model’s understanding of multimodal data. This is particularly useful in cases where the correlation between the image and text is weak, as it provides additional semantic cues to help the model more accurately identify named entities. Additionally, we employ a denoising strategy based on multi-curriculum learning, which defines noise metrics at different granularities to progressively optimize the presentation order of the training data, reducing the impact of noise on the model. Within this framework, we conduct a comprehensive noise assessment of both images and text, gradually introducing cleaner data to improve model training. Experimental results show that, by combining topic prompt with multi-curriculum denoising strategies, TPMCLNet significantly improves MNER performance in complex multimodal environments, demonstrating its effectiveness.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103405"},"PeriodicalIF":14.7,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144337809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"QD-MSA : A quantum distributed tensor network framework for multimodal sentiment analysis","authors":"Yuelin Li, Yangyang Li, Zhengya Qi, Haorui Yang, Ronghua Shang, Licheng Jiao","doi":"10.1016/j.inffus.2025.103404","DOIUrl":"10.1016/j.inffus.2025.103404","url":null,"abstract":"<div><div>Multimodal sentiment analysis, which integrates data types such as audio, text, and images, is increasingly vital for understanding emotional content in the era of social media and short video platforms. Quantum computing, with its inherent characteristics like superposition and entanglement, is conceptually well-suited for multimodal learning, particularly for modal fusion. However, current quantum computers face limitations, such as a restricted number of usable qubits, hindering their ability to surpass classical computing (quantum supremacy). In this work, we propose QD-MSA, a quantum distributed multimodal sentiment analysis framework, which is the first to apply quantum circuit splitting techniques to multimodal sentiment analysis, reducing qubit usage from <span><math><mi>n</mi></math></span> to <span><math><mrow><mi>n</mi><mo>/</mo><mn>2</mn><mo>+</mo><mn>1</mn></mrow></math></span>. This advancement enables the execution of more complex quantum programs on Noisy Intermediate-Scale Quantum (NISQ) devices by partially overcoming qubit scarcity. Additionally, QD-MSA contains a novel workflow that integrates our model into quantum computer clusters, significantly enhancing computational performance and unlocking the potential of NISQ-era quantum computers. By combining classical neural networks for feature extraction with quantum models for feature fusion, our approach conserves quantum resources while achieving superior performance. Experimental evaluations on the CMU-MOSEI and CMU-MOSI datasets demonstrate that our model achieves comparable or superior performance to deep learning-based methods, with notable improvements in key metrics. Furthermore, our work represents the first successful integration of quantum computing principles into multimodal sentiment analysis, with experiments confirming that the proposed model significantly outperforms classical approaches relying solely on quantum-inspired strategies. These contributions establish a scalable and efficient framework for multimodal sentiment analysis, leveraging both classical and quantum computing paradigms to advance the field.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103404"},"PeriodicalIF":14.7,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144337828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-06-20DOI: 10.1016/j.inffus.2025.103397
Manuel Lage Cañellas , Constantino Álvarez Casado , Le Nguyen , Miguel Bordallo López
{"title":"A self-supervised multimodal framework for 1D physiological data fusion in remote health monitoring","authors":"Manuel Lage Cañellas , Constantino Álvarez Casado , Le Nguyen , Miguel Bordallo López","doi":"10.1016/j.inffus.2025.103397","DOIUrl":"10.1016/j.inffus.2025.103397","url":null,"abstract":"<div><div>The growth of labeled data for remote healthcare analysis lags far behind the rapid expansion of raw data, creating a significant bottleneck. To address this, we propose a multimodal self-supervised learning (SSL) framework for 1D signals that leverages unlabeled physiological data. Our architecture fuses heart and respiration waveforms from three sensors – mWave radar, RGB camera, and depth camera – while processing and augmenting each modality separately. It then uses contrastive learning to extract robust features from the data. This architecture enables effective downstream task training, with reduced labeled data even in scenarios where certain sensors or modalities are unavailable. We validate our approach using the OMuSense-23 multimodal biometric dataset, and evaluate its performance on tasks such as breathing pattern recognition and physiological classification. Our results show that the models perform comparably to fully supervised methods when using large amounts of labeled data and outperforms them when using only a small percentage. In particular, with 1% of the labels, the model achieves 64% accuracy in breathing pattern classification compared to 24 % with a fully supervised approach. This work highlights the scalability and adaptability of self-supervised learning for physiological monitoring, making it particularly valuable for healthcare and well-being applications with limited labels or sensor availability. The code is publicly available at: <span><span>https://gitlab.com/manulainen/ssl-physiological</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103397"},"PeriodicalIF":14.7,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144337826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-06-20DOI: 10.1016/j.inffus.2025.103384
Long Xu , Peng Gao , Wen-Jia Tang , Fei Wang , Ru-Yue Yuan
{"title":"Towards effective and efficient adversarial defense with diffusion models for robust visual tracking","authors":"Long Xu , Peng Gao , Wen-Jia Tang , Fei Wang , Ru-Yue Yuan","doi":"10.1016/j.inffus.2025.103384","DOIUrl":"10.1016/j.inffus.2025.103384","url":null,"abstract":"<div><div>Although deep learning-based visual tracking methods have made significant progress, they exhibit vulnerabilities when facing carefully designed adversarial attacks, which can lead to a sharp decline in tracking performance. To address this issue, this paper proposes for the first time a novel adversarial defense method based on denoise diffusion probabilistic models, termed DiffDf, aimed at effectively improving the robustness of existing visual tracking methods against adversarial attacks. DiffDf establishes a multi-scale defense mechanism by combining pixel-level reconstruction loss, semantic consistency loss, and structural similarity loss, effectively suppressing adversarial perturbations through a gradual denoising process. Extensive experimental results on several mainstream datasets show that the DiffDf method demonstrates excellent generalization performance for trackers with different architectures, significantly improving various evaluation metrics while achieving real-time inference speeds of over 30 FPS, showcasing outstanding defense performance and efficiency. Codes are available at <span><span>https://github.com/pgao-lab/DiffDf</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103384"},"PeriodicalIF":14.7,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144337829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-06-20DOI: 10.1016/j.inffus.2025.103398
Yulin Xia, Chang Wu, Xiaoman Yang
{"title":"L3former: Enhanced multi-scale shared Transformer with Local Linear Layer for long-term series forecasting","authors":"Yulin Xia, Chang Wu, Xiaoman Yang","doi":"10.1016/j.inffus.2025.103398","DOIUrl":"10.1016/j.inffus.2025.103398","url":null,"abstract":"<div><div>Long-term time series forecasting is crucial in areas such as energy management and climate modeling. While multi-scale Transformer architectures have demonstrated success in long-term forecasting, they face challenges including high computational complexity and limited effectiveness in multi-scale decomposition and fusion. To address this, we introduce <strong>L<sup>3</sup>former</strong>, a Transformer-based multi-scale shared network that integrates Local Linear Layer (<strong>L<sup>3</sup></strong>), Scale-Wise Attention Mechanism (<strong>SWAM</strong>), and Variable-Wise Feed-Forward Layer (<strong>VWFF</strong>). L<sup>3</sup> is an innovative neural network layer, which independently aggregates temporal information within windows via local linear connections and shares weights across channels, utilizing varying window sizes to construct multi-scale features. SWAM adeptly fuses these multi-scale features by assigning attention weights across different scales. Moreover, all scales share a unified embedding space and backbone network, thereby reducing the complexity of models. Furthermore, VWFF is incorporated into the standard Transformer encoder to mitigate the performance degradation caused by channel independence. On average across nine datasets, L<span><math><msup><mrow></mrow><mrow><mn>3</mn></mrow></msup></math></span>former outperforms recent state-of-the-art models, achieving 5.8%–16.7% lower MSE in long-term forecasting tasks.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103398"},"PeriodicalIF":14.7,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144337823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-06-20DOI: 10.1016/j.inffus.2025.103363
Shaoting Zhang , Yishan Du , Wenji Wang , Xianying He , Fangfang Cui , Liang Zhao , Bei Wang , Zhiqiang Hu , Ziqiang Wang , Qing Xia , Tian Shen , Jie Zhao
{"title":"ECGFM: A foundation model for ECG analysis trained on a multi-center million-ECG dataset","authors":"Shaoting Zhang , Yishan Du , Wenji Wang , Xianying He , Fangfang Cui , Liang Zhao , Bei Wang , Zhiqiang Hu , Ziqiang Wang , Qing Xia , Tian Shen , Jie Zhao","doi":"10.1016/j.inffus.2025.103363","DOIUrl":"10.1016/j.inffus.2025.103363","url":null,"abstract":"<div><div>The electrocardiogram (ECG) is widely used for diagnosing heart conditions due to its cost-effectiveness, non-invasiveness, and accessibility. Between 2014 and 2017, the First Affiliated Hospital of Zhengzhou University collected over a million clinical ECGs from diverse primary hospitals, each accompanied by initial diagnostic results. Effectively utilizing this vast dataset with potential label inconsistencies is a key challenge. In this study, we introduce ECGFM, a foundation model pre-trained on over a million clinical ECGs to achieve deep ECG comprehension. ECGFM comprises a convolutional encoder, a transformer decoder, and task-specific heads, pre-trained through three complementary sub-tasks: (i) contrastive predictive learning for unsupervised representation learning, (ii) normal/abnormal classification, and (iii) diagnostic text generation. Given potential label unreliability, active learning is integrated with the classification task to select key data for re-annotation, enhancing supervision quality. To enable ECGFM’s adaptability to downstream tasks with any-lead inputs, a transferred convolutional encoder is trained to align feature distributions. ECGFM’s effectiveness is evaluated using diverse public datasets, including PTB-XL, Georgia, CPSC, CinC 2020, MITDB, and the “Hefei High-tech Cup” dataset. Fine-tuning ECGFM (training only a task-specific head) delivers strong performance across datasets, nearing fully supervised methods. Additionally, in a one-month online test with 7951 recordings, ECGFM achieved a recall of 0.9335, a precision of 0.9571, and an F1-score of 0.9451, underscoring its robustness and potential for real-world applications.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103363"},"PeriodicalIF":14.7,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144337827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-06-19DOI: 10.1016/j.inffus.2025.103375
Xiao Yang , Gaolei Li , Kai Zhou , Yuni Lai , Jianhua Li
{"title":"Auth-Graph: GenAI-empowered attribute-masked backdoor for on-demand authorizable graph learning","authors":"Xiao Yang , Gaolei Li , Kai Zhou , Yuni Lai , Jianhua Li","doi":"10.1016/j.inffus.2025.103375","DOIUrl":"10.1016/j.inffus.2025.103375","url":null,"abstract":"<div><div>Owing to the ability to fuse non-Euclidean node-edge information, Graph Learning (GL) is pervasively leveraged across applications including web recommendation, community detection, and molecular classification. Current GL paradigms extremely emphasize absolute fairness and impartiality for all clients. This limits its flexibility and adaptability in addressing specific circumstances that demand customizable model queries (<em>e.g.</em>, access control and intellectual property protection), where authorizable GL models present non-trivial obstacles in realization. Inspired by Generative Artificial Intelligence (GenAI), to overcome this limitation, we propose Auth-Graph, the first authorizable GL methodology via a built-in-model access control mechanism. Specifically, our Auth-Graph employs a generative perturbating-driven backdoor to reach authorizable access. The activation of the backdoor is exclusively confined to rightly masked and perturbed inputs, which yield accurate results, whereas all other inputs induce the GL model to produce erroneous outcomes. Moreover, to strengthen compatibility and support multi-user functionality, the masking mechanism operates correctly with a generative masker solely for authorized users possessing valid tokens, with each user’s token being uniquely distinct. Empirical results across benchmark GL models and datasets substantiate that Auth-Graph robustly prevents unauthorized access (average accuracy 3.68%) while promoting legitimate users to attain standard outputs (average accuracy drop 3.45%).</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103375"},"PeriodicalIF":14.7,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144337830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-06-18DOI: 10.1016/j.inffus.2025.103438
Mingjun Tian , Minjuan Zheng , Shi Qiu , Hongbing Lu
{"title":"Multi-section myocardial status evaluation algorithm based on electrocardiogram and ultrasound image fusion","authors":"Mingjun Tian , Minjuan Zheng , Shi Qiu , Hongbing Lu","doi":"10.1016/j.inffus.2025.103438","DOIUrl":"10.1016/j.inffus.2025.103438","url":null,"abstract":"<div><div>The heart is a vital organ in the human body, with the myocardium being an essential component. The microcirculatory state of the myocardium is directly correlated with heart function, making its study of significant importance. Currently, myocardial analysis predominantly relies on subjective assessments by physicians, lacking quantitative indicators and effective imaging techniques. To facilitate real-time observation of cardiac conditions, we propose a multi-section myocardial status evaluation algorithm based on electrocardiogram and ultrasound image fusion. 1) Aligning multi-section ultrasound images in the temporal perspective using electrocardiograms as a foundation for subsequent analyses. 2) Introducing a myocardial segmentation model that incorporates both deep and shallow features, utilizing multi-scale to obtain more information and achieve myocardial precise extraction. 3) Constructing a bullseye plot based on medical diagnostic standards, and introducing quantitative indicators for assessment, intuitively displayed the results through color mapping. We compile an imaging dataset from 411 clinical groups. Two professional radiologists mark the myocardial regions using a blinded method, with their qualitative assessments of cardiac conduction status serving as the gold standard. Experiments show that: 1) The algorithm effectively segments the myocardium, achieving an Area Overlap Measure (AOM) of 94 %, which is a 13 % improvement over the EUnet model. 2) The myocardial status assessment algorithm yields acceptable results, assisting directly in the diagnosis in 84 % of cases, thereby enhancing the accuracy of physicians’ detections.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103438"},"PeriodicalIF":14.7,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144337831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-06-18DOI: 10.1016/j.inffus.2025.103402
Huan Kang , Hui Li , Xiao-Jun Wu , Tianyang Xu , Rui Wang , Chunyang Cheng , Josef Kittler
{"title":"GrFormer: A novel Transformer on Grassmann manifold for infrared and visible image fusion","authors":"Huan Kang , Hui Li , Xiao-Jun Wu , Tianyang Xu , Rui Wang , Chunyang Cheng , Josef Kittler","doi":"10.1016/j.inffus.2025.103402","DOIUrl":"10.1016/j.inffus.2025.103402","url":null,"abstract":"<div><div>In the field of image fusion, promising progress has been made by modeling data from different modalities as linear subspaces. However, in practice, the source images are often located in a non-Euclidean space, where the Euclidean methods usually cannot encapsulate the intrinsic topological structure. Typically, the inner product performed in the Euclidean space calculates the algebraic similarity rather than the semantic similarity, which results in undesired attention output and a decrease in fusion performance. While the balance of low-level details and high-level semantics should be considered in infrared and visible image fusion task. To address this issue, in this paper, we propose a novel attention mechanism based on Grassmann manifold for infrared and visible image fusion (<strong><em>GrFormer</em></strong>). Specifically, our method constructs a low-rank subspace mapping through projection constraints on the Grassmann manifold, compressing attention features into subspaces of varying rank levels. This forces the features to decouple into high-frequency details (local low-rank) and low-frequency semantics (global low-rank), thereby achieving multi-scale semantic fusion. Additionally, to effectively integrate the significant information, we develop a cross-modal fusion strategy (CMS) based on a covariance mask to maximize the complementary properties between different modalities and to suppress the features with high correlation, which are deemed redundant. The experimental results demonstrate that our network outperforms SOTA methods both qualitatively and quantitatively on multiple image fusion benchmarks. The codes will be available soon.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"125 ","pages":"Article 103402"},"PeriodicalIF":14.7,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144337832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}