Information Fusion最新文献

筛选
英文 中文
Interpretable breast cancer diagnosis using histopathology and lesion mask as domain concepts conditional simulation ultrasonography 以组织病理学和病变掩膜为领域概念的条件模拟超声可解释的乳腺癌诊断
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-20 DOI: 10.1016/j.inffus.2025.103343
Guowei Dai , Chaoyu Wang , Qingfeng Tang , Yi Zhang , Duwei Dai , Lang Qiao , Jiaojun Yan , Hu Chen
{"title":"Interpretable breast cancer diagnosis using histopathology and lesion mask as domain concepts conditional simulation ultrasonography","authors":"Guowei Dai ,&nbsp;Chaoyu Wang ,&nbsp;Qingfeng Tang ,&nbsp;Yi Zhang ,&nbsp;Duwei Dai ,&nbsp;Lang Qiao ,&nbsp;Jiaojun Yan ,&nbsp;Hu Chen","doi":"10.1016/j.inffus.2025.103343","DOIUrl":"10.1016/j.inffus.2025.103343","url":null,"abstract":"<div><div>Breast cancer diagnosis using ultrasound imaging presents challenges due to inherent limitations in image quality and the complex nature of lesion interpretation. We propose SgmaFuse, a novel interpretable multimodal framework that integrates histopathological concepts and lesion masks information , treated as domain concepts, with ultrasound imaging for accurate and explainable breast cancer diagnosis. At its core, SgmaFuse employs a Spatially Guided Multi-Level Alignment Mechanism (SGMLAM) that orchestrates global–local feature interactions across modalities. This is achieved through a sophisticated hierarchical strategy incorporating cross-modal fusion and attention-based feature correspondence at four distinct levels: global image-report alignment, local mask-guided attention report alignment, local image diagnostic report alignment, and concept-level diagnostic report alignment. Concurrently, a Histological Semantic Activation Vector Learning (HSAVL) module, leveraging kernel Support Vector Machines, learns discriminative semantic concepts directly from histopathological data, thereby bridging the gap between ultrasound imaging features and established pathological patterns via robust concept-level alignment. The framework ability to provide transparent, structured diagnostic explanations through interpretable visual attention maps and semantic concept contributions demonstrates its potential as a reliable clinical decision support tool, particularly in the challenging domain of breast ultrasound diagnosis.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103343"},"PeriodicalIF":14.7,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144116560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fuzzy processing applied to improve multimodal sensor data fusion to discover frequent behavioral patterns for smart healthcare 应用模糊处理改进多模态传感器数据融合,为智能医疗发现频繁的行为模式
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-20 DOI: 10.1016/j.inffus.2025.103307
Carlos Fernandez-Basso , David Díaz-Jimenez , Jose L. López , Macarena Espinilla
{"title":"Fuzzy processing applied to improve multimodal sensor data fusion to discover frequent behavioral patterns for smart healthcare","authors":"Carlos Fernandez-Basso ,&nbsp;David Díaz-Jimenez ,&nbsp;Jose L. López ,&nbsp;Macarena Espinilla","doi":"10.1016/j.inffus.2025.103307","DOIUrl":"10.1016/j.inffus.2025.103307","url":null,"abstract":"<div><div>The extraction and utilization of latent information from sensor data is gaining increasing prominence due to its potential for transforming decision-making processes across various sectors. Data mining techniques provide robust tools for analyzing large-scale data generated by advanced network management systems, offering actionable insights that drive operational efficiency and strategic improvements. However, the sheer volume of sensor data, combined with challenges related to real-world sensor deployment and user interaction, necessitates the development of advanced data fusion and processing frameworks. This paper presents an innovative automatic fusion and fuzzification methodology designed to integrate multi-source sensor data into coherent, high-quality intelligent outputs. By applying fuzzy logic, the proposed system enhances the interpretability and interoperability of complex sensor datasets. The approach has been validated in a real-world scenario within sensorized homes of Type II diabetic patients in Cabra (Córdoba, Spain), where it aids healthcare professionals in monitoring and optimizing patient routines. Experimental results demonstrate the system’s effectiveness in identifying and analyzing behavioral patterns, highlighting its potential to improve patient care through advanced sensor data fusion techniques.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103307"},"PeriodicalIF":14.7,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144123262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-scale dual-attention frequency fusion for joint segmentation and deformable medical image registration 多尺度双关注频率融合关节分割与形变医学图像配准
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-20 DOI: 10.1016/j.inffus.2025.103293
Hongchao Zhou , Shiyu Liu , Shunbo Hu
{"title":"Multi-scale dual-attention frequency fusion for joint segmentation and deformable medical image registration","authors":"Hongchao Zhou ,&nbsp;Shiyu Liu ,&nbsp;Shunbo Hu","doi":"10.1016/j.inffus.2025.103293","DOIUrl":"10.1016/j.inffus.2025.103293","url":null,"abstract":"<div><div>Deformable medical image registration is a crucial aspect of medical image analysis. Improving the accuracy and plausibility of registration by information fusion is still a problem that needs to be addressed. To solve this problem, we propose DAFF-Net, a novel framework that systematically unifies three kind of information fusion (low-level fusion, high-level fusion, and loss fusion) to enhance registration precision and plausibility: (i) low-level fusion: DAFF-Net employs a shared global encoder to extract common anatomical features from both moving and fixed images in two tasks, reducing redundancy and ensuring foundational consistency across tasks; (ii) high-level fusion: through the dual attention frequency fusion (DAFF) module, DAFF-Net dynamically combines multi-scale registration and segmentation features, leverages features of low-frequency structural coherences and high-frequency boundary details, and adaptively reweighting them to enhance registration via global and local attention mechanisms; (iii) loss fusion: a unified loss function enforces bidirectional consistency, i.e., segmentation supervises registration through anatomical constraints, while registration refines segmentation via deformation-correct anatomical consistency. Extensive experiments on three public 3D brain magnetic resonance imaging (MRI) datasets demonstrate that the proposed DAFF-Net and its unsupervised variant outperform state-of-the-art registration methods across several evaluation metrics, demonstrating the effectiveness of our approach in deformable medical image registration. The proposed framework holds promise for practical clinical applications such as preoperative planning, longitudinal disease tracking, and structural analysis in neurological disorders.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103293"},"PeriodicalIF":14.7,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144107801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hyperspectral super-resolution via nonlinear unmixing 通过非线性解混的高光谱超分辨率
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-20 DOI: 10.1016/j.inffus.2025.103295
Qingke Zou , Jie Zhou , Mingjie Luo
{"title":"Hyperspectral super-resolution via nonlinear unmixing","authors":"Qingke Zou ,&nbsp;Jie Zhou ,&nbsp;Mingjie Luo","doi":"10.1016/j.inffus.2025.103295","DOIUrl":"10.1016/j.inffus.2025.103295","url":null,"abstract":"<div><div>Fusing a hyperspectral image (HSI) with a multispectral image (MSI) to produce a super-resolution image (SRI) that possesses both fine spatial and spectral resolutions is a widely adopted technique in hyperspectral super-resolution (HSR). Most existing HSR methods accomplish this task within the framework of linear mixing model (LMM). However, a severe challenge lies in the inherent linear constraint of LMM, which hinders the adaptability of these HSR methods to complex real-world scenarios. In this work, the LMM is extended to the generalized bilinear model (GBM), and a novel HSR method based on nonnegative tensor factorization is proposed in the framework of nonlinear unmixing. Apart from the linear part, it additionally considers the main nonlinear interactions, that is, the bilinear interactions between the endmembers. Crucially, each potential decomposition factor possesses a physical interpretation, enabling the incorporation of prior information to enhance reconstruction performance. Furthermore, an HSR algorithm has been devised specifically for scenarios where the spatial degradation operators from SRI to HSI are unknown, which undoubtedly enhances its practical applicability. The proposed methods overcome the inherent linear limitations of the LMM framework while avoiding the information loss associated with matrixizing HSI and MSI. The effectiveness of the proposed methods is showcased through simulated and real data.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103295"},"PeriodicalIF":14.7,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144139767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LPM-Net: Lightweight pixel-level modeling network based on CNN and Mamba for 3D medical image fusion LPM-Net:基于CNN和Mamba的轻量级像素级建模网络,用于3D医学图像融合
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-20 DOI: 10.1016/j.inffus.2025.103306
Mingwei Wen, Xuming Zhang
{"title":"LPM-Net: Lightweight pixel-level modeling network based on CNN and Mamba for 3D medical image fusion","authors":"Mingwei Wen,&nbsp;Xuming Zhang","doi":"10.1016/j.inffus.2025.103306","DOIUrl":"10.1016/j.inffus.2025.103306","url":null,"abstract":"<div><div>Deep learning-based medical image fusion has become a prevalent approach to facilitate computer-aided diagnosis and treatment. The mainstream image fusion methods predominantly rely on encoder–decoder architectures and utilize unsupervised loss functions for training, resulting in the blurring or loss of fused image details and limited inference speed. To resolve these problems, this paper presents a pixel-level modeling network for effective fusion of 3D medical images. The network comprises three structurally identical branches: an unsupervised fusion branch and two supervised reconstruction branches. In the fusion branch, the feature extraction modules utilize the dense convolutional neural network and Mamba to extract image features based on axis decomposition. The base and detail components are then predicted from these extracted features and fused to generate the fused image pixel by pixel. Notably, two reconstruction branches share the parameters of feature extraction modules with the fusion branch and provide the supervised loss, which is integrated with the unsupervised loss to enhance the fusion performance. The experiments on six datasets of multiple modalities and organs demonstrates that our method achieves effective medical image fusion by preserving image details effectively, minimizing image blurring and reducing the number of parameters. Meanwhile, our method has significant advantages in eight fusion metrics over the compared mainstream methods, and it provides relatively fast inference speed (e.g., 90 volumes/s on the BraTS2020 dataset). Indeed, our method will provide valuable means to improve the accuracy and efficiency of image fusion-based diagnosis and treatment systems. The source code is available on GitHub at <span><span>https://github.com/coolllcat/LPM-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103306"},"PeriodicalIF":14.7,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144106510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FABRF-Net: A frequency-aware boundary and region fusion network for breast ultrasound image segmentation 用于乳腺超声图像分割的频率感知边界和区域融合网络
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-20 DOI: 10.1016/j.inffus.2025.103299
Yan Liu, Yan Yang, Yongquan Jiang, Xiaole Zhao, Zhuyang Xie
{"title":"FABRF-Net: A frequency-aware boundary and region fusion network for breast ultrasound image segmentation","authors":"Yan Liu,&nbsp;Yan Yang,&nbsp;Yongquan Jiang,&nbsp;Xiaole Zhao,&nbsp;Zhuyang Xie","doi":"10.1016/j.inffus.2025.103299","DOIUrl":"10.1016/j.inffus.2025.103299","url":null,"abstract":"<div><div>Breast ultrasound (BUS) image segmentation is crucial for tumor analysis and cancer diagnosis. However, the challenges of lesion segmentation in BUS images arise from inter-class indistinction caused by low contrast, high speckle noise, artifacts, and blurred boundaries, as well as intra-class inconsistency due to variations in lesion size, shape, and location. To address these challenges, we propose a novel frequency-aware boundary and region fusion network (FABRF-Net). The core of our FABRF-Net is the frequency domain-based Haar wavelet decomposition module (HWDM), which effectively captures multi-scale frequency feature information from global spatial contexts. This allows our network to integrate the advantages of CNNs and Transformers for more comprehensive frequency and spatial feature modeling, effectively addressing intra-class inconsistency. Moreover, the frequency awareness based on HWDM is used to separate features into boundary and region streams, enhancing detailed edges in boundary features and reducing the impact of noise on lesion region features. We further develop a boundary-region fusion module (BRFM) to enable adaptive fusion and mutual guidance of frequency-aware region and boundary features, effectively mitigating inter-class indistinction and achieving accurate breast lesion segmentation. Quantitative and qualitative experimental results demonstrate that FABRF-Net achieves state-of-the-art segmentation accuracy on six cross-domain ultrasound datasets and has obvious advantages in segmenting small breast tumors.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103299"},"PeriodicalIF":14.7,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144139766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards eXplicitly eXplainable Artificial Intelligence 走向可明确解释的人工智能
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-18 DOI: 10.1016/j.inffus.2025.103352
Vyacheslav L. Kalmykov , Lev V. Kalmykov
{"title":"Towards eXplicitly eXplainable Artificial Intelligence","authors":"Vyacheslav L. Kalmykov ,&nbsp;Lev V. Kalmykov","doi":"10.1016/j.inffus.2025.103352","DOIUrl":"10.1016/j.inffus.2025.103352","url":null,"abstract":"<div><div>Artificial Intelligence (AI) plays a leading role in Industry 4.0 and future Industry 5.0. Concerns about the opacity of today's neural network AI solutions have led to the Explainable AI (XAI) project, which attempts to open the black box of neural networks. While XAI can help to partially interpret and explain the workings of neural networks, it has not changed their original subsymbolic nature and the opaque statistical nature of their workings. Significant uncertainties remain about the safety, reliability, and accountability of modern neural network AI solutions. Here we present a novel AI method that has a fully transparent white-box nature - eXplicitly eXplainable Artificial Intelligence (XXAI). XXAI is implemented on deterministic cellular automata whose rules are based on first principles of the problem domain. XXAI overcomes the limitations for a broader application of symbolic AI. The practical value of XXAI lies in its ability to make autonomous, fully transparent decisions due to its multi-component, multi-level, networked, hyper-logical nature. Looking ahead, XXAI has the potential to become a leading strategic partner in the field of neuro-symbolic hybrid AI systems. XXAI is able to systematically validate neural network solutions, ensuring that the required standards of reliability, security and ethics are met throughout the AI lifecycle, from training to deployment. By creating a clear cognitive framework, XXAI will enable the development of advanced autonomous solutions to achieve the human-centric values of the future Industry 5.0. A comprehensive program for the further development of the proposed approach is presented.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103352"},"PeriodicalIF":14.7,"publicationDate":"2025-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144116561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harnessing the potential of multimodal EHR data: A comprehensive survey of clinical predictive modeling for intelligent healthcare 利用多模式电子病历数据的潜力:智能医疗保健临床预测建模的全面调查
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-17 DOI: 10.1016/j.inffus.2025.103283
Jialun Wu , Kai He , Rui Mao , Xuequn Shang , Erik Cambria
{"title":"Harnessing the potential of multimodal EHR data: A comprehensive survey of clinical predictive modeling for intelligent healthcare","authors":"Jialun Wu ,&nbsp;Kai He ,&nbsp;Rui Mao ,&nbsp;Xuequn Shang ,&nbsp;Erik Cambria","doi":"10.1016/j.inffus.2025.103283","DOIUrl":"10.1016/j.inffus.2025.103283","url":null,"abstract":"<div><div>The digitization of healthcare has led to the accumulation of vast amounts of patient data through Electronic Health Records (EHRs) systems, creating significant opportunities for advancing intelligent healthcare. Recent breakthroughs in deep learning and information fusion techniques have enabled the seamless integration of diverse data sources, providing richer insights for clinical decision-making. This review offers a comprehensive analysis of predictive modeling approaches that leverage multimodal EHR data, focusing on the latest methodologies and their practical applications. We classify the current advancements from both task-driven and method-driven perspectives, while distilling key challenges and motivations that have fueled these innovations. This exploration examines the real-world impact of advanced technologies in healthcare, addressing issues from data integration to task formulation, challenges, and method refinement. The role of information fusion in enhancing model performance is also emphasized. Building on the discussions and findings, we highlight promising future research directions critical for advancing multimodal fusion technologies in clinical predictive modeling, addressing the complex challenges of real-world clinical environments, and moving toward universal intelligence in healthcare.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103283"},"PeriodicalIF":14.7,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144099201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PNCD: Mitigating LLM hallucinations in noisy environments–A medical case study PNCD:在嘈杂环境中减轻LLM幻觉-一个医学案例研究
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-16 DOI: 10.1016/j.inffus.2025.103328
Jiayi Qu , Jun Liu , Xiangjun Liu , Meihui Chen , Jinchi Li , Jintao Wang
{"title":"PNCD: Mitigating LLM hallucinations in noisy environments–A medical case study","authors":"Jiayi Qu ,&nbsp;Jun Liu ,&nbsp;Xiangjun Liu ,&nbsp;Meihui Chen ,&nbsp;Jinchi Li ,&nbsp;Jintao Wang","doi":"10.1016/j.inffus.2025.103328","DOIUrl":"10.1016/j.inffus.2025.103328","url":null,"abstract":"<div><div>Although large language models (LLMs) have demonstrated impressive reasoning capabilities, the generated responses may contain inaccurate or fictitious information due to noise and redundancy in the data that can interfere with the model's reasoning. Noise is often difficult to avoid in massive data, and manual denoising requires a lot of time, manpower, and material resources. Particularly in the medical and legal domains, specialized textual data requires a greater ability to cope with the hallucinations of LLMs. The ability to maintain the accuracy of information in noisy environments without distorting, modifying, or introducing creative elements is particularly critical. In this paper, we propose an Adaptive Positive and Negative weight Contrast Decoding (PNCD) based on RAG to solve the hallucination of LLMs in noisy contexts environments. Specifically, we construct a set of expert and non-expert LLMs for BaseLLMs: expert LLMs extract information from the set of correct examples, while non-expert LLMs extract information from the set of negative examples that induce hallucinations for BaseLLMs. Their goal is to eliminate noisy information and identify redundant information in the output space of BaseLLMs, to guide BaseLLMs to generate more accurate factual content. We assign enhancement weights to the expert LLMs parameter distributions and penalty weights to the non-expert LLMs parameter distributions, to amplify the prediction effect of expert LLMs suppress the prediction effect of non-expert LLMs, and determine the final correct prediction of the next token. In addition, the KV buffer was introduced to reduce the resource consumption. Experimental results show that PNCD achieves sota results on the medical dataset, with an inference speed of 32 tokens/sec on a single card (RTX 4070TiS). The GPU memory footprint is 0.84G (initially 4.2G). There is also some generalization capability on legal datasets and multiple simultaneous public datasets.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103328"},"PeriodicalIF":14.7,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144106511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal graph representation learning for robust surgical workflow recognition with adversarial feature disentanglement 基于多模态图表示学习的对抗特征解纠缠鲁棒外科工作流程识别
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-16 DOI: 10.1016/j.inffus.2025.103290
Long Bai , Boyi Ma , Ruohan Wang , Guankun Wang , Beilei Cui , Zhongliang Jiang , Mobarakol Islam , Zhe Min , Jiewen Lai , Nassir Navab , Hongliang Ren
{"title":"Multimodal graph representation learning for robust surgical workflow recognition with adversarial feature disentanglement","authors":"Long Bai ,&nbsp;Boyi Ma ,&nbsp;Ruohan Wang ,&nbsp;Guankun Wang ,&nbsp;Beilei Cui ,&nbsp;Zhongliang Jiang ,&nbsp;Mobarakol Islam ,&nbsp;Zhe Min ,&nbsp;Jiewen Lai ,&nbsp;Nassir Navab ,&nbsp;Hongliang Ren","doi":"10.1016/j.inffus.2025.103290","DOIUrl":"10.1016/j.inffus.2025.103290","url":null,"abstract":"<div><div>Surgical workflow recognition is vital for automating tasks, supporting decision-making, and training novice surgeons, ultimately improving patient safety and standardizing procedures. However, data corruption can lead to performance degradation due to issues like occlusion from bleeding or smoke in surgical scenes and problems with data storage and transmission. Therefore, a robust workflow recognition model is urgently needed. In this case, we explore a robust graph-based multimodal approach to integrating vision and kinematic data to enhance accuracy and reliability. Vision data captures dynamic surgical scenes, while kinematic data provides precise movement information, overcoming limitations of visual recognition under adverse conditions. We propose a multimodal Graph Representation network with Adversarial feature Disentanglement (GRAD) for robust surgical workflow recognition in challenging scenarios with domain shifts or corrupted data. Specifically, we introduce a Multimodal Disentanglement Graph Network (MDGNet) that captures fine-grained visual information while explicitly modeling the complex relationships between vision and kinematic embeddings through graph-based message modeling. To align feature spaces across modalities, we propose a Vision-Kinematic Adversarial (VKA) framework that leverages adversarial training to reduce modality gaps and improve feature consistency. Furthermore, we design a Contextual Calibrated Decoder, incorporating temporal and contextual priors to enhance robustness against domain shifts and corrupted data. Extensive comparative and ablation experiments demonstrate the effectiveness of our model and proposed modules. Specifically, we achieved an accuracy of 86.87% and 92.38% on two public datasets, respectively. Moreover, our robustness experiments show that our method effectively handles data corruption during storage and transmission, exhibiting excellent stability and robustness. Our approach aims to advance automated surgical workflow recognition, addressing the complexities and dynamism inherent in surgical procedures.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103290"},"PeriodicalIF":14.7,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144123261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信