Information Fusion最新文献_第7页

GraphEdge: Dynamic graph partition and task scheduling for GNNs computing in edge network graphhedge：边缘网络中gnn计算的动态图分区和任务调度

IF 14.7 1区计算机科学

Information Fusion Pub Date : 2025-06-09 DOI: 10.1016/j.inffus.2025.103329

Wenjing Xiao , Chenglong Shi , Miaojiang Chen , Zhiquan Liu , Min Chen , H. Herbert Song

{"title":"GraphEdge: Dynamic graph partition and task scheduling for GNNs computing in edge network","authors":"Wenjing Xiao , Chenglong Shi , Miaojiang Chen , Zhiquan Liu , Min Chen , H. Herbert Song","doi":"10.1016/j.inffus.2025.103329","DOIUrl":"10.1016/j.inffus.2025.103329","url":null,"abstract":"<div><div>With the exponential growth of Internet of Things (IoT) devices, edge computing (EC) is gradually playing an important role in providing cost-effective services. However, existing approaches struggle to perform well in graph-structured scenarios where user data is correlated, such as traffic flow prediction and social relationship recommender systems. In particular, graph neural network (GNN)-based approaches lead to expensive server communication cost. To address this problem, we propose GraphEdge, an efficient GNN-based EC architecture. It considers the EC system of GNN tasks, where there are associations between users and it needs to take into account the task data of its neighbors when processing the tasks of a user. Specifically, the architecture first perceives the user topology and represents their data associations as a graph layout at each time step. Then the graph layout is optimized by calling our proposed hierarchical traversal graph cut algorithm (HiCut), which cuts the graph layout into multiple weakly associated subgraphs based on the aggregation characteristics of GNN, and the communication cost between different subgraphs during GNN inference is minimized. Finally, based on the optimized graph layout, our proposed deep reinforcement learning (DRL) based graph offloading algorithm (DRLGO) is executed to obtain the optimal offloading strategy for the tasks of users, the offloading strategy is subgraph-based, it tries to offload user tasks in a subgraph to the same edge server as possible while minimizing the task processing time and energy consumption of the EC system. Experimental results show the good effectiveness and dynamic adaptation of our proposed architecture and it also performs well even in dynamic scenarios.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103329"},"PeriodicalIF":14.7,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144297057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fusion of metadata and dermoscopic images for melanoma detection: Deep learning and feature importance analysis 黑色素瘤检测的元数据和皮肤镜图像融合：深度学习和特征重要性分析

IF 14.7 1区计算机科学

Information Fusion Pub Date : 2025-06-06 DOI: 10.1016/j.inffus.2025.103304

Misbah Ahmad , Imran Ahmed , Abdellah Chehri , Gwangill Jeon

{"title":"Fusion of metadata and dermoscopic images for melanoma detection: Deep learning and feature importance analysis","authors":"Misbah Ahmad , Imran Ahmed , Abdellah Chehri , Gwangill Jeon","doi":"10.1016/j.inffus.2025.103304","DOIUrl":"10.1016/j.inffus.2025.103304","url":null,"abstract":"<div><div>In the era of smart healthcare, integrating multimodal data is essential for improving diagnostic accuracy and enabling personalized care. This study presented a deep learning-based multimodal approach for melanoma detection, leveraging both dermoscopic images and clinical metadata to enhance classification performance. The proposed model integrated a multi-layer convolutional neural network (CNN) to extract image features and combined them with structured metadata, including patient age, gender, and lesion location, through feature-level fusion. The fusion process occurred at the final CNN layer, where high-dimensional image feature vectors were concatenated with processed metadata. The metadata was handled separately through a fully connected neural network comprising multiple dense layers. The final fused representation was passed through additional dense layers, culminating in a classification layer that outputted the probability of melanoma presence. The model was trained end-to-end using the SIIM-ISIC dataset, allowing it to learn a joint representation of image and metadata features for optimal classification. Various data augmentation techniques were applied to dermoscopic images to mitigate class imbalance and improve model robustness. Additionally, exploratory data analysis (EDA) and feature importance analysis were conducted to assess the contribution of each metadata feature to the overall classification. Our fusion-based deep learning architecture outperformed single-modality models, boosting classification accuracy. The presented model achieved an accuracy of 94.5% and an overall F1-score of 0.94, validating its effectiveness in melanoma detection. This study aims to highlight the potential of deep learning-based multimodal fusion in enhancing diagnostic precision, offering a scalable and reliable solution for improved melanoma detection in smart healthcare systems.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103304"},"PeriodicalIF":14.7,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144290701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PolNet-CR: Spatial-channel collaborative interaction network for PolSAR incremental information-assisted optical satellite imagery cloud removal PolNet-CR：用于PolSAR增量信息辅助光学卫星图像云去除的空间通道协同交互网络

IF 14.7 1区计算机科学

Information Fusion Pub Date : 2025-06-06 DOI: 10.1016/j.inffus.2025.103367

Jiangong Xu, Xiaoyu Yu, Jun Pan, Liwen Cao, Mi Wang

{"title":"PolNet-CR: Spatial-channel collaborative interaction network for PolSAR incremental information-assisted optical satellite imagery cloud removal","authors":"Jiangong Xu, Xiaoyu Yu, Jun Pan, Liwen Cao, Mi Wang","doi":"10.1016/j.inffus.2025.103367","DOIUrl":"10.1016/j.inffus.2025.103367","url":null,"abstract":"<div><div>Reconstruction of missing information in cloud-contaminated optical satellite images is an urgent problem to ensure the continuous spatiotemporal monitoring of Earth’s environment. Despite the recent notable advancements of polarimetric synthetic aperture radar (PolSAR)-fused methods in mitigating the impact of clouds in optical satellite images, challenges remain in recovering high-fidelity cloud-free imagery, which primarily stem from the domain gaps in multimodal imaging mechanisms, the inherent speckle noise in PolSAR imagery, and the inadequate utilization of PolSAR's physical scattering properties, particularly its polarization information. To address these, this paper proposes a novel spatial-channel collaborative interaction network (PolNet-CR), designed to efficiently remove clouds from optical satellite images by incorporating PolSAR data through cross-scale spatial fine-grained aggregation and sparse channel statistical majorization. The network's architecture consists of multiple sequentially stacked information incremental collaboration (IIC) blocks, forming a deep iterative hierarchical framework for more informative cross-domain feature adaptive extraction, interaction, and fusion. Additionally, this paper developed LuojiaSET-PolCR, the first public dataset explicitly incorporating PolSAR polarimetric scattering characteristics for cloud removal in optical imagery, to advance research in this field. Based on this dataset, this paper conducted comparative analyses of PolNet-CR against current representative cloud removal algorithms, employing the “4S” multi-dimensional integrated evaluation system, which assesses spatial, spectral, semantic, and scalability performance. Experimental results demonstrate that PolNet-CR achieves significant improvements in both quantitative metrics and qualitative visual perception, while also meeting the demand for spatiotemporal cloud-free imagery in seamless continuous observation applications. The project is publicly available at: <span><span>https://github.com/RSIIPAC/PolNet-CR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103367"},"PeriodicalIF":14.7,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144262343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dual-domain decoupled fusion network for semantic segmentation of remote sensing images 用于遥感图像语义分割的双域解耦融合网络

IF 14.7 1区计算机科学

Information Fusion Pub Date : 2025-06-06 DOI: 10.1016/j.inffus.2025.103359

Xin Li , Feng Xu , Jue Zhang , Anzhu Yu , Xin Lyu , Hongmin Gao , Jun Zhou

{"title":"Dual-domain decoupled fusion network for semantic segmentation of remote sensing images","authors":"Xin Li , Feng Xu , Jue Zhang , Anzhu Yu , Xin Lyu , Hongmin Gao , Jun Zhou","doi":"10.1016/j.inffus.2025.103359","DOIUrl":"10.1016/j.inffus.2025.103359","url":null,"abstract":"<div><div>Semantic segmentation of remote sensing images (RSIs) is a challenging task due to the complexity of spatial structures, diverse object scales, and heterogeneous land-cover patterns. Traditional approaches often struggle to effectively balance fine-grained boundary details and global contextual understanding, especially for high-resolution images. In this paper, we propose DDFNet, a novel dual-domain decoupled fusion network, to address these challenges. DDFNet integrates spatial and frequency domain features through a dynamic decoupling and fusion strategy. Specifically, we introduce a Dual-Domain Decoupled Feature Fusion (DDFF) module that selectively combines high- and low-frequency components from both domains, enabling the model to capture local textures and global semantics. To further enhance segmentation accuracy, we design a High-Order Geometric Prior Generation (HGPG) module, which utilizes gradient and curvature information to improve boundary precision and maintain geometric consistency. Extensive experiments on three benchmark datasets — ISPRS Vaihingen, ISPRS Potsdam, and LoveDA — demonstrate that DDFNet achieves state-of-the-art performance. Ablation studies validate the contributions of the DDFF and HGPG modules, and efficiency analysis shows DDFNet’s strong adaptability to different computational constraints.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103359"},"PeriodicalIF":14.7,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144253383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-level correlation information fusion via three-way concept-cognitive learning for multi-label learning 基于三向概念认知学习的多层次相关信息融合多标签学习

IF 14.7 1区计算机科学

Information Fusion Pub Date : 2025-06-06 DOI: 10.1016/j.inffus.2025.103361

Jiaming Wu , Eric C.C. Tsang , Weihua Xu , Chengling Zhang , Qianshuo Wang

{"title":"Multi-level correlation information fusion via three-way concept-cognitive learning for multi-label learning","authors":"Jiaming Wu , Eric C.C. Tsang , Weihua Xu , Chengling Zhang , Qianshuo Wang","doi":"10.1016/j.inffus.2025.103361","DOIUrl":"10.1016/j.inffus.2025.103361","url":null,"abstract":"<div><div>Multi-label learning tasks typically involve complex correlation between labels, which often span across multiple levels. Accurately capturing and fusing these multi-level correlation information is crucial for improving prediction performance and understanding the potential relationship between labels. The current mainstream label correlation acquisition methods mainly focus on statistical analysis of labels. However, these methods lack exploration of the hierarchical structure of correlation, which may lead to the cognitive bias of labels and the decline in predictive performance. To address this, a multi-label learning model with multi-level correlation information fusion via three-way concept-cognitive learning (MCF-3WCCL) is proposed to capture the hierarchical correlation between labels more comprehensively, improve the prediction performance and enhance the interpretability. In this model, three-way concept-cognitive operators are utilized to structurally represent label concepts, thereby capturing the hierarchical correlations among labels. Additionally, the extent of label concepts are used as clues, which are mapped into feature concepts to form the dependencies between labels and features. On this basis, by fusing these feature concepts, the overall cognition of the label is finally formed. Extensive comparative experiments reflect that the proposed method is superior and versatile.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103361"},"PeriodicalIF":14.7,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144262344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

XStacking : An effective and inherently explainable framework for stacked ensemble learning XStacking：一个用于堆叠集成学习的有效且内在可解释的框架

IF 14.7 1区计算机科学

Information Fusion Pub Date : 2025-06-05 DOI: 10.1016/j.inffus.2025.103358

Moncef Garouani , Ayah Barhrhouj , Olivier Teste

{"title":"XStacking : An effective and inherently explainable framework for stacked ensemble learning","authors":"Moncef Garouani , Ayah Barhrhouj , Olivier Teste","doi":"10.1016/j.inffus.2025.103358","DOIUrl":"10.1016/j.inffus.2025.103358","url":null,"abstract":"<div><div>Ensemble Machine Learning (EML) techniques, especially stacking, have proven effective in boosting predictive performance by combining several base models. However, traditional stacked ensembles often face challenges in predictive effectiveness of the learning space and model interpretability, which limit their practical application. In this paper, we introduce <em>XStacking</em>, an effective and inherently explainable framework that addresses these limitations by integrating dynamic feature transformation with model-agnostic Shapley Additive Explanations. XStacking is designed to enhance both effectiveness and transparency, ensuring high predictive accuracy and providing clear insights into model decisions. We evaluated the framework on 29 benchmark datasets for classification and regression tasks, showing its competitive performance compared to state-of-the-art stacked ensembles. Furthermore, XStacking interpretability features offer actionable insights into feature contributions and decision pathways, making it a practical and scalable solution for applications where both high performance and model transparency are critical.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103358"},"PeriodicalIF":14.7,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144253385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Combining depth and frequency features with Mamba for multi-focus image fusion 结合深度和频率特征与曼巴多焦点图像融合

IF 14.7 1区计算机科学

Information Fusion Pub Date : 2025-06-04 DOI: 10.1016/j.inffus.2025.103355

Xin Jin , Pengcheng Zhu , Dongjian Yu , Michal Wozniak , Qian Jiang , Puming Wang , Wei Zhou

{"title":"Combining depth and frequency features with Mamba for multi-focus image fusion","authors":"Xin Jin , Pengcheng Zhu , Dongjian Yu , Michal Wozniak , Qian Jiang , Puming Wang , Wei Zhou","doi":"10.1016/j.inffus.2025.103355","DOIUrl":"10.1016/j.inffus.2025.103355","url":null,"abstract":"<div><div>Deep neural network (DNN)-based multi-focus image fusion (MFIF) methods have achieved significant success in generating an all-focus image by extracting visual features from multiple partially focused images. However, these methods fail to fully exploit frequency domain and depth information, leading to limitations in handling uniform and boundary regions. To address this issue, we propose a Mamba-based multi-focus image fusion framework to enhance fusion quality. Specifically, we introduce the Wavelet Mamba Module, which applies multi-level wavelet transforms to decompose the image into different frequency components, thereby enhancing contrast differences in focused regions and improving feature extraction across different focal planes. Meanwhile, a depth estimation network is employed to predict the foreground depth map, aiding in the precise identification of boundary regions. Finally, the CSmamba decoder effectively integrates frequency and depth features and leverages channel and spatial attention mechanisms to generate an optimized decision map, enabling precise selection of focused pixels and producing a high-quality fused image. Experimental results on both synthetic and real-world datasets demonstrate that the proposed method outperforms state-of-the-art MFIF approaches in terms of quantitative metrics and visual details, validating that the effective utilization of frequency and depth information can significantly enhance multi-focus image fusion performance.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103355"},"PeriodicalIF":14.7,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144290698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FBMTO: Fusion of Blockchain and Multi-Agent Reinforcement Learning for Task Offloading in edge computing 边缘计算中区块链与多智能体强化学习的融合

IF 14.7 1区计算机科学

Information Fusion Pub Date : 2025-06-04 DOI: 10.1016/j.inffus.2025.103344

Juan Li, Ruhong Liu, Wei Liu

{"title":"FBMTO: Fusion of Blockchain and Multi-Agent Reinforcement Learning for Task Offloading in edge computing","authors":"Juan Li, Ruhong Liu, Wei Liu","doi":"10.1016/j.inffus.2025.103344","DOIUrl":"10.1016/j.inffus.2025.103344","url":null,"abstract":"<div><div>In the <u>I</u>nternet of <u>T</u>hings (IoT) era, blockchain-enhanced edge services have emerged as a promising paradigm to process and secure the massive data generated by IoT devices. However, the extra resources and processing time taken up by the blockchain cannot be ignored, which poses a greater challenge to the task offloading problem for the service. Motivated by this, this paper proposes a more efficient fusion framework, FBMTO, which fully considers the balance between additional costs and performance improvements for task offloading in edge computing. The framework transforms the task offloading problem into a dual-objective optimization problem by fully considering the costs produced by the on-chain tasks. Then, we design a <u>B</u>lockchain-enabled <u>M</u>ulti-<u>A</u>gent <u>R</u>einforcement learning <u>T</u>ask <u>O</u>ffloading algorithm (BMARTO) to find the optimal offloading decision. In BMARTO, a novel reputation mechanism is introduced to speed up the optimization process while maintaining efficiency in the task offloading process. Additionally, we design an <u>E</u>dge-<u>C</u>omputing <u>D</u>elegated <u>P</u>roof of <u>S</u>take consensus algorithm (ECDPoS), an improved <u>D</u>elegated <u>P</u>roof of <u>S</u>take (DPoS) that enabling high-throughput consensus operations in edge environments and boosting blockchain efficiency via a novel smart contract. Experimental results demonstrate that FBMTO reduces task latency and energy consumption while ensuring data security, outperforming existing methods in various scenarios.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103344"},"PeriodicalIF":14.7,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144239447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TSProto: Fusing deep feature extraction with interpretable glass-box surrogate model for explainable time-series classification TSProto：融合深度特征提取与可解释玻璃盒代理模型的可解释时间序列分类

IF 14.7 1区计算机科学

Information Fusion Pub Date : 2025-06-04 DOI: 10.1016/j.inffus.2025.103357

Szymon Bobek, Grzegorz J. Nalepa

{"title":"TSProto: Fusing deep feature extraction with interpretable glass-box surrogate model for explainable time-series classification","authors":"Szymon Bobek, Grzegorz J. Nalepa","doi":"10.1016/j.inffus.2025.103357","DOIUrl":"10.1016/j.inffus.2025.103357","url":null,"abstract":"<div><div>Deep neural networks (DNNs) are highly effective at extracting features from complex data types, such as images and text, but often function as black-box models, making interpretation difficult. We propose TSProto — a model-agnostic approach that goes beyond standard XAI methods focused on feature importance, clustering important segments into conceptual prototypes — high-level, human-interpretable units. This approach not only enhances transparency but also avoids issues seen with surrogate models, such as the Rashomon effect, enabling more direct insights into DNN behavior. Our method involves two phases: (1) using feature attribution tools (e.g., SHAP, LIME) to highlight regions of model importance, and (2) fusion of these regions into prototypes with contextual information to form meaningful concepts. These concepts then integrate into an interpretable decision tree, making DNNs more accessible for expert analysis. We benchmark our solution on 61 publicly available datasets, where it outperforms other state-of-the-art prototype-based methods and glassbox models by an average of 10% in the F1 metric. Additionally, we demonstrate its practical applicability in a real-life anomaly detection case. The results from the user evaluation, conducted with 17 experts recruited from leading European research teams and industrial partners, also indicate a positive reception among experts in XAI and the industry. Our implementation is available as an open-source Python package on GitHub and PyPi.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103357"},"PeriodicalIF":14.7,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144262345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A hybrid model using multimodal feature perception and multiple cross-attention fusion for depressive episodes detection 基于多模态特征感知和多重交叉注意融合的抑郁发作检测混合模型

IF 14.7 1区计算机科学

Information Fusion Pub Date : 2025-06-04 DOI: 10.1016/j.inffus.2025.103354

Yaqi Wang , Tingting Qu , Wenbo Zhu , Qi Wang , Yuping Cao , Renzhou Gui

{"title":"A hybrid model using multimodal feature perception and multiple cross-attention fusion for depressive episodes detection","authors":"Yaqi Wang , Tingting Qu , Wenbo Zhu , Qi Wang , Yuping Cao , Renzhou Gui","doi":"10.1016/j.inffus.2025.103354","DOIUrl":"10.1016/j.inffus.2025.103354","url":null,"abstract":"<div><div>Depressive episodes are among the most prevalent manifestations of mood disorders worldwide. Currently, the diagnosis of depressive episodes primarily relies on professional clinical assessments. However, with the rising prevalence of depressive episodes, together with the increased diversity of subtypes, atypical presentations, and insidiousness of symptoms, timely and accurate detection of depressive episodes has become more difficult. To address this issue, a hybrid model based on multimodal feature perception and multiple cross-attention fusion (MFCAF) is proposed for the automated detection of depressive episodes. MFCAF integrates video, audio, and functional near-infrared spectroscopy (fNIRS) data collected under identical stimulus conditions. It consists of two primary phases: feature perception and feature fusion. In the feature perception stage, a multi-scale convolutional neural network (CNN) combined with a gated recurrent unit (GRU) is utilized to extract video features. Meanwhile, deep audio features are extracted by applying a Vision Transformer (ViT) to the heatmap generated from the correlation matrix of the Mel spectrogram. Additionally, a multi-channel CNN is used to extract fNIRS features. In the feature fusion stage, a Transformer-based multiple cross-attention fusion module is constructed to capture complex cross-modal dependencies. The experimental results show that, on the dataset collected from 122 participants, MFCAF can detect depressive episodes quickly and accurately, outperforming the baseline methods. The MFCAF model achieved an accuracy of 78.38% under the negative stimulus task. These results suggest that the proposed model holds promise as a rapid auxiliary detection tool for depressive episodes in large-scale populations.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103354"},"PeriodicalIF":14.7,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144253384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0