{"title":"Explainable variable-weight multi-modal based deep learning framework for catheter malposition detection","authors":"Yuhan Wang, Hak Keung Lam","doi":"10.1016/j.inffus.2025.103170","DOIUrl":null,"url":null,"abstract":"<div><div>Hospital patients may have catheters and lines inserted for quick administration of medicines or medical tests. However, a misplaced catheter can cause serious complications, even death. Recently, deep learning frameworks have shown their potential to assist in detecting catheter malposition in radiography. However, the deep learning malposition detection frameworks meet three main challenges: (1) Most approaches rely heavily on visual information, requiring models with many parameters for accurate detection. (2) Geometric information in radiography that is important for experts for decision making is often underutilized due to the inherent complexities in accurately extracting and integrating it with visual information. (3) Feature significance in catheter status detection is often underexplored, making the framework difficult to interpret and requiring a mechanism to highlight key factors influencing decisions. Therefore, to address these challenges, an explainable variable-weight multimodal based deep learning framework is proposed to fuse the visual and geometric information in the radiography for catheter malposition detection. The convolution neural network (CNN) stream and the graph convolution neural network (GCN) stream, with few learnable parameters, are designed to extract the visual and geometric information without compromising performance. The cross-modal attention block is proposed to capture the relationship between visual and geometric information. Furthermore, the multimodal variable-weight structure is proposed to fuse different modalities based on their significance. To visualize the contribution of each modality, the multimodal class activation map (MCAM) is designed to visualize the activated region in radiography, showing where the framework focuses. The proposed method obtains state-of-the-art performance, gaining 0.8816 mean AUC with 7.62 million parameters.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103170"},"PeriodicalIF":14.7000,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S156625352500243X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Hospital patients may have catheters and lines inserted for quick administration of medicines or medical tests. However, a misplaced catheter can cause serious complications, even death. Recently, deep learning frameworks have shown their potential to assist in detecting catheter malposition in radiography. However, the deep learning malposition detection frameworks meet three main challenges: (1) Most approaches rely heavily on visual information, requiring models with many parameters for accurate detection. (2) Geometric information in radiography that is important for experts for decision making is often underutilized due to the inherent complexities in accurately extracting and integrating it with visual information. (3) Feature significance in catheter status detection is often underexplored, making the framework difficult to interpret and requiring a mechanism to highlight key factors influencing decisions. Therefore, to address these challenges, an explainable variable-weight multimodal based deep learning framework is proposed to fuse the visual and geometric information in the radiography for catheter malposition detection. The convolution neural network (CNN) stream and the graph convolution neural network (GCN) stream, with few learnable parameters, are designed to extract the visual and geometric information without compromising performance. The cross-modal attention block is proposed to capture the relationship between visual and geometric information. Furthermore, the multimodal variable-weight structure is proposed to fuse different modalities based on their significance. To visualize the contribution of each modality, the multimodal class activation map (MCAM) is designed to visualize the activated region in radiography, showing where the framework focuses. The proposed method obtains state-of-the-art performance, gaining 0.8816 mean AUC with 7.62 million parameters.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.