Information Fusion最新文献

筛选
英文 中文
SPAFPN: Wear a multi-scale feature fusion scarf around neck for real-time object detector
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-02-25 DOI: 10.1016/j.inffus.2025.103034
Zhentian Bian, Bin Yao, Qing Li
{"title":"SPAFPN: Wear a multi-scale feature fusion scarf around neck for real-time object detector","authors":"Zhentian Bian,&nbsp;Bin Yao,&nbsp;Qing Li","doi":"10.1016/j.inffus.2025.103034","DOIUrl":"10.1016/j.inffus.2025.103034","url":null,"abstract":"<div><div>Nowadays, the feature extraction module of the backbone network develops rapidly in the field of real-time object detection, while the multi-scale design for the neck iterates slowly. The traditional PAFPN is short in cross-scale propagation efficiency. However, many newly proposed multi-scale fusion methods, which make up for this drawback, are difficult to widely apply because of their complex fusion modules and unfriendly training. In this paper, we propose the Scarf Path Aggregation Feature Pyramid Network (SPAFPN), an advanced neck structure of multi-scale fusion for real-time object detection. SPAFPN adheres to the decentralized multi-scale fusion idea of ”Light Fusion, Heavy Decouple” while inheriting the concept of versatility and portability design. SPAFPN can promote cross-scale low-loss transfer of features and improve the performance of the model, which mainly consists of Pyramid Fusion and Multi-Concat modules. The experimental results on the MS COCO dataset show that SPAFPN-HG-N/S/M can achieve a 5.2%/3.2%/1.1% mAP improvement over YOLOv8, respectively. We also adopted some latest backbone networks to verify the scalability of SPAFPN. Specifically, SPAFPN combined with C2f and GELAN as backbone network modules both perform well. In addition, SPAFPN still maintains good performance when applied to the DETR-based real-time object detector. The code is available at <span><span>https://github.com/ztbian-bzt/SPAFPN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103034"},"PeriodicalIF":14.7,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-objective math problem generation using large language model through an adaptive multi-level retrieval augmentation framework
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-02-25 DOI: 10.1016/j.inffus.2025.103037
Jianwen Sun , Wangzi Shi , Xiaoxuan Shen , Shengyingjie Liu , Luona Wei , Qian Wan
{"title":"Multi-objective math problem generation using large language model through an adaptive multi-level retrieval augmentation framework","authors":"Jianwen Sun ,&nbsp;Wangzi Shi ,&nbsp;Xiaoxuan Shen ,&nbsp;Shengyingjie Liu ,&nbsp;Luona Wei ,&nbsp;Qian Wan","doi":"10.1016/j.inffus.2025.103037","DOIUrl":"10.1016/j.inffus.2025.103037","url":null,"abstract":"<div><div>Math problems are an important knowledge carrier and evaluation means in personalized teaching. Their high cost of manual compilation promotes the research of math problem generation. Many previous studies have focused on the generation of math word problems, which are difficult to meet the real teaching needs due to the single task-objective orientation and small differences in generation results. By fusing external knowledge through retrieval-augmented generation (RAG), large language model (LLM) can generate a variety of math problems, but the generated results still have limitations such as poor knowledge consistency, uncontrollability, and high computational cost. In this paper, we propose the task of multi-objective math problem generation (MMPG). This task introduces the triple objectives of generation including “question type, knowledge point and difficulty” in respond to teaching needs in real scene. To the best of our knowledge, this is the first study considering multiple objectives on the process of math problem generation. Based on this, we further design an adaptive multi-level retrieval augmentation framework (AMRAF) for LLM to generate multi-objective math problems. This plug-and-play framework can effectively improve the generation performance without parameter tuning of the target model due to the fine-grained information retrieval and fusion. To verify the effectiveness of the proposed framework and provide a benchmark for subsequent research, we construct an MMPG dataset containing 9,000 samples. Experimental results demonstrate the superiority and effectiveness of our framework.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103037"},"PeriodicalIF":14.7,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143508891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive structural-guided multi-level representation learning with graph contrastive for incomplete multi-view clustering 针对不完整多视图聚类的自适应结构引导多级表征学习与图形对比
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-02-24 DOI: 10.1016/j.inffus.2025.103035
Haiyue Wang , Wensheng Zhang , Quan Wang , Xiaoke Ma
{"title":"Adaptive structural-guided multi-level representation learning with graph contrastive for incomplete multi-view clustering","authors":"Haiyue Wang ,&nbsp;Wensheng Zhang ,&nbsp;Quan Wang ,&nbsp;Xiaoke Ma","doi":"10.1016/j.inffus.2025.103035","DOIUrl":"10.1016/j.inffus.2025.103035","url":null,"abstract":"<div><div>Incomplete multi-view clustering (IMC) is a pivotal task within the area of machine learning, encompassing several unresolved challenges, such as representation of objects, relations of various views, discriminative of features, and data restoration. To address these challenges, we propose a novel <u>A</u>daptive <u>S</u>tructural-guided <u>M</u>ulti-level representation <u>L</u>earning with <u>G</u>raph <u>C</u>ontrastive algorithm for IMC (ASMLGC), where feature learning, data restoration and clustering are simultaneously integrated. Concretely, ASMLGC learns multi-level representations of objects by extending auto-encoders, which explicitly captures hierarchical structure of multi-view data, providing a better and more comprehensive strategy to characterize data from multiple resolutions. And, missing views are recovered by leveraging multi-level representations, where global and local information are fully exploited to enhance the accuracy and robustness of imputation. Furthermore, ASMLGC proposes graph contrastive learning to maximize intra-cluster consistency, where information derived from various resolutions, such as feature level and meta-structure level, is explored to construct positive and negative samples, thereby improving discriminative of features. The extensive experimental results confirm that ASMLGC outperforms baselines on benchmarking datasets, particularly for these datasets with complicated hierarchical structure. The proposed algorithm can be applied to bioinformatics, medical image analysis, and social network analysis.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103035"},"PeriodicalIF":14.7,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143520567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CAM2Former: Fusion of Camera-specific Class Activation Map matters for occluded person re-identification
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-02-24 DOI: 10.1016/j.inffus.2025.103011
Zichang Tan , Guiwei Zhang , Zihui Tan , Prayag Tiwari , Yi Wang , Yang Yang
{"title":"CAM2Former: Fusion of Camera-specific Class Activation Map matters for occluded person re-identification","authors":"Zichang Tan ,&nbsp;Guiwei Zhang ,&nbsp;Zihui Tan ,&nbsp;Prayag Tiwari ,&nbsp;Yi Wang ,&nbsp;Yang Yang","doi":"10.1016/j.inffus.2025.103011","DOIUrl":"10.1016/j.inffus.2025.103011","url":null,"abstract":"<div><div>Occluded person re-identification (ReID) is challenging since persons are frequently perturbed by various occlusions. Existing mainstream schemes prioritize the alignment of fine-grained body parts by error-prone computation-intensive information, which might come with high estimation error and much computation. To this end, we present the <strong>CAM</strong>emra-specific <strong>C</strong>lass <strong>A</strong>ctivation <strong>M</strong>ap (<span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>), designed to identify critical foreground components with interpretability and computational efficiency. Expanding on this foundation, we launched the <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>-guided Vision Transformer, which is termed <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Former, with three core designs. <strong>First</strong>, we develop Fusion of CAMmera-specific Class Activation Map, termed <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Fusion, which consists of positive and negative <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span> that operate in synergy to capture visual patterns representative of the discriminative foreground components. <strong>Second</strong>, to enhance the representation ability of pivotal foreground components, we introduce a <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Fusion-attention mechanism. This strategy imposes sparse attention weights on identity-agnostic interference discerned by positive and negative <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>. <strong>Third</strong>, since the enhancement of foreground representations in <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Former depends on camera-specific classifiers, which are not available during inference, we introduce a consistent learning scheme. This design ensures that representations derived from vanilla ViT align consistently with those obtained via <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Former. This facilitates the extraction of discriminative foreground representations, circumventing <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span> dependencies during inference without additional complexity. Extensive experimental results demonstrate that the proposed method achieves state-of-the-art performance on two occluded datasets (Occluded-Duke and Occluded-REID) and two holistic datasets (Market1501 and MSMT17), achieving an R1 of 74.4% and a mAP of 64.8% on Occluded-Dukes.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103011"},"PeriodicalIF":14.7,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143619908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From screens to scenes: A survey of embodied AI in healthcare
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-02-21 DOI: 10.1016/j.inffus.2025.103033
Yihao Liu , Xu Cao , Tingting Chen , Yankai Jiang , Junjie You , Minghua Wu , Xiaosong Wang , Mengling Feng , Yaochu Jin , Jintai Chen
{"title":"From screens to scenes: A survey of embodied AI in healthcare","authors":"Yihao Liu ,&nbsp;Xu Cao ,&nbsp;Tingting Chen ,&nbsp;Yankai Jiang ,&nbsp;Junjie You ,&nbsp;Minghua Wu ,&nbsp;Xiaosong Wang ,&nbsp;Mengling Feng ,&nbsp;Yaochu Jin ,&nbsp;Jintai Chen","doi":"10.1016/j.inffus.2025.103033","DOIUrl":"10.1016/j.inffus.2025.103033","url":null,"abstract":"<div><div>Healthcare systems worldwide face persistent challenges in efficiency, accessibility, and personalization. Modern artificial intelligence (AI) has shown promise in addressing these issues through precise predictive modeling; however, its impact remains constrained by limited integration into clinical workflows. Powered by modern AI technologies such as multimodal large language models and world models, Embodied AI (EmAI) represents a transformative frontier, offering enhanced autonomy and the ability to interact with the physical world to address these challenges. As an interdisciplinary and rapidly evolving research domain, “EmAI in healthcare” spans diverse fields such as algorithms, robotics, and biomedicine. This complexity underscores the importance of timely reviews and analyses to track advancements, address challenges, and foster cross-disciplinary collaboration. In this paper, we provide a comprehensive overview of the “brain” of EmAI for healthcare, wherein we introduce foundational AI algorithms for perception, actuation, planning, and memory, and focus on presenting the healthcare applications spanning clinical interventions, daily care &amp; companionship, infrastructure support, and biomedical research, covering 35 specialized tasks. These significant advancements have the potential to enable personalized care, enhance diagnostic accuracy, and optimize treatment outcomes. Despite its promise, the development of EmAI for healthcare is hindered by critical challenges such as safety concerns, gaps between simulation platforms and real-world applications, the absence of standardized benchmarks, and uneven progress across interdisciplinary domains. We discuss the technical barriers and explore ethical considerations, offering a forward-looking perspective on the future of EmAI in healthcare. A hierarchical framework of intelligent levels for EmAI systems is also introduced to guide further development. By providing systematic insights, this work aims to inspire innovation and practical applications, paving the way for a new era of intelligent, patient-centered healthcare.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103033"},"PeriodicalIF":14.7,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143512171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep multi-view clustering: A comprehensive survey of the contemporary techniques
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-02-20 DOI: 10.1016/j.inffus.2025.103012
Anal Roy Chowdhury , Avisek Gupta , Swagatam Das
{"title":"Deep multi-view clustering: A comprehensive survey of the contemporary techniques","authors":"Anal Roy Chowdhury ,&nbsp;Avisek Gupta ,&nbsp;Swagatam Das","doi":"10.1016/j.inffus.2025.103012","DOIUrl":"10.1016/j.inffus.2025.103012","url":null,"abstract":"<div><div>Data can be represented by multiple sets of features, where each semantically coherent set of features is called a view. For example, an image can be represented by multiple sets of features that measure textures, shapes, edge features, etc. Collecting multiple views of data is generally easier than annotating it with the help of experts. Thus, the unsupervised exploration of data in consultation with all collected views is essential to identify naturally occurring clusters of data instances. In deep multi-view clustering, deep neural networks are used to obtain non-linear latent representations of data instances that agree with the multiple views, using which clusters of data instances are identified. A wide variety of such deep multi-view clustering approaches exist, which we systematically study and categorize into a novel taxonomy that provides structure to the existing literature and can also guide future researchers. We provide a pedagogical discussion on preliminary concepts to help understand topics relevant to the studied deep clustering methods. Various multi-view problems that are being studied are summarized, and future research scopes have been noted.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103012"},"PeriodicalIF":14.7,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143487572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review of medical text analysis: Theory and practice 医学文本分析综述:理论与实践
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-02-19 DOI: 10.1016/j.inffus.2025.103024
Yani Chen , Chunwu Zhang , Ruibin Bai , Tengfang Sun , Weiping Ding , Ruili Wang
{"title":"A review of medical text analysis: Theory and practice","authors":"Yani Chen ,&nbsp;Chunwu Zhang ,&nbsp;Ruibin Bai ,&nbsp;Tengfang Sun ,&nbsp;Weiping Ding ,&nbsp;Ruili Wang","doi":"10.1016/j.inffus.2025.103024","DOIUrl":"10.1016/j.inffus.2025.103024","url":null,"abstract":"<div><div>Medical data analysis has emerged as an important driving force for smart healthcare with applications ranging from disease analysis to triage, diagnosis, and treatment. Text data plays a crucial role in providing contexts and details that other data types cannot capture alone, making its analysis an indispensable resource in medical research. Natural language processing, a key technology for analyzing and interpreting text, is essential for extracting meaningful insights from medical text data. This systematic review explores the analysis of text data in medicine, focusing on the applications of natural language processing methods. We retrieved a total of 4,784 publications from four databases. After applying rigorous exclusion criteria, 192 relevant publications are selected for in-depth analysis. These studies are evaluated from five critical perspectives: emerging trends of medical text analysis, commonly employed methodologies, major data sources, research topics, and applications in real-world problem-solving. Our analysis provides a comprehensive overview of the current state of medical text analysis, highlighting its advantages, limitations, and future potential. Finally, we identify key challenges and outline future research directions for advancing medical text analysis.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103024"},"PeriodicalIF":14.7,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143520565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing cross-domain generalization by fusing language-guided feature remapping
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-02-19 DOI: 10.1016/j.inffus.2025.103029
Ziteng Qiao , Dianxi Shi , Songchang Jin , Yanyan Shi , Luoxi Jing , Chunping Qiu
{"title":"Enhancing cross-domain generalization by fusing language-guided feature remapping","authors":"Ziteng Qiao ,&nbsp;Dianxi Shi ,&nbsp;Songchang Jin ,&nbsp;Yanyan Shi ,&nbsp;Luoxi Jing ,&nbsp;Chunping Qiu","doi":"10.1016/j.inffus.2025.103029","DOIUrl":"10.1016/j.inffus.2025.103029","url":null,"abstract":"<div><div>Domain generalization refers to training a model with annotated source domain data and making it generalize to various unseen target domains. It has been extensively studied in classification, but remains challenging in object detection. Existing domain generalization object detection methods mainly rely on generative or adversarial data augmentation, which increases the complexity of training. Recently, vision-language models (VLMs), such as CLIP, have demonstrated strong cross-modal alignment capabilities, showing potential for enhancing domain generalization. On this basis, the paper proposes a language-guided feature remapping method, which leverages VLMs to augment sample features and improve the generalization performance of regular models. In detail, we first construct a teacher-student network structure. Then, we introduce a feature remapping module that remaps sample features in both local and global spatial dimensions to improve the distribution of feature representations. Concurrently, we design domain prompts and class prompts to guide the sample features to remap into a more generalized and universal feature space. Finally, we establish a knowledge distillation structure to facilitate knowledge transfer between teacher and student networks, enhancing the domain generalization ability of the student network. Multiple experimental results demonstrate the superiority of our proposed method.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103029"},"PeriodicalIF":14.7,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143453712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSF-Net: Multi-stage fusion network for emotion recognition from multimodal signals in scalable healthcare
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-02-19 DOI: 10.1016/j.inffus.2025.103028
Md. Milon Islam , Fakhri Karray , Ghulam Muhammad
{"title":"MSF-Net: Multi-stage fusion network for emotion recognition from multimodal signals in scalable healthcare","authors":"Md. Milon Islam ,&nbsp;Fakhri Karray ,&nbsp;Ghulam Muhammad","doi":"10.1016/j.inffus.2025.103028","DOIUrl":"10.1016/j.inffus.2025.103028","url":null,"abstract":"<div><div>Automatic emotion recognition has attracted significant interest in healthcare, thanks to remarkable developments made recently in smart and innovative technologies. A real-time emotion recognition system allows for continuous monitoring, comprehension, and enhancement of the physical entity’s capacities, along with continuing advice for enhancing quality of life and well-being in the context of personalized healthcare. Multimodal emotion recognition presents a significant challenge in terms of efficiently using the diverse modalities present in the data. In this article, we introduce a Multi-Stage Fusion Network (MSF-Net) for emotion recognition capable of extracting multimodal information and achieving significant performances. We propose utilizing the transformer-based structure to extract deep features from facial expressions. We exploited two visual descriptors, local binary pattern and Oriented FAST and Rotated BRIEF, to retrieve the computer vision-based features from the facial videos. A feature-level fusion network integrates the extraction of features from these modules, directing the output into the triplet attention technique. This module employs a three-branch architecture to compute attention weights to capture cross-dimensional interactions efficiently. The temporal dependencies in physiological signals are modeled by a Bi-directional Gated Recurrent Unit (Bi-GRU) in forward and backward directions at each time step. Lastly, the output feature representations from the triplet attention module and the extracted high-level patterns from Bi-GRU are fused and fed into the classification module to recognize emotion. The extensive experimental evaluations revealed that the proposed MSF-Net outperformed the state-of-the-art approaches on two popular datasets, BioVid Emo DB and MGEED. Finally, we tested the proposed MSF-Net in the Internet of Things environment to facilitate real-world scalable smart healthcare application.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103028"},"PeriodicalIF":14.7,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143474630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SSEFusion: Salient semantic enhancement for multimodal medical image fusion with Mamba and dynamic spiking neural networks
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-02-19 DOI: 10.1016/j.inffus.2025.103031
Shiqiang Liu , Weisheng Li , Dan He , Guofen Wang , Yuping Huang
{"title":"SSEFusion: Salient semantic enhancement for multimodal medical image fusion with Mamba and dynamic spiking neural networks","authors":"Shiqiang Liu ,&nbsp;Weisheng Li ,&nbsp;Dan He ,&nbsp;Guofen Wang ,&nbsp;Yuping Huang","doi":"10.1016/j.inffus.2025.103031","DOIUrl":"10.1016/j.inffus.2025.103031","url":null,"abstract":"<div><div>Multimodal medical image fusion technology enhances medical representations and plays a vital role in clinical diagnosis. However, fusing medical images remains a challenge due to the stochastic nature of lesions and the complex structures of organs. Although many fusion methods have been proposed recently, most struggle to effectively establish global context dependency while preserving salient semantic features, leading to the loss of crucial medical information. Therefore, we propose a novel salient semantic enhancement fusion (SSEFusion) framework, whose key components include a dual-branch encoder that combines Mamba and spiking neural network (SNN) models (Mamba-SNN encoder), feature interaction attention (FIA) blocks, and a decoder equipped with detail enhancement (DE) blocks. In the encoder, the Mamba-based branch introduces visual state space (VSS) blocks to efficiently establish global dependencies and extract global features for the effective identification of the lesion area. Meanwhile, the SNN-based branch dynamically extracts fine-grained salient features to enhance the retention of medical semantic information in complex structures. Global features and fine-grained salient features semantically interact to achieve feature complementarity through the FIA blocks. Benefiting from the DE block, SSEFusion generates fused images with prominent edge textures. Furthermore, we propose a salient semantic loss based on leaky-integrate-and-fire (LIF) neurons to enhance the guidance in extracting key features. Extensive fusion experiments show that SSEFusion outperforms state-of-the-art fusion methods in terms of salient medical semantic information retention. The code is available at <span><span>https://github.com/Shiqiang-Liu/SSEFusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103031"},"PeriodicalIF":14.7,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143474631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信