Multimodal medical image fusion based on dilated convolution and attention-based graph convolutional network

IF 4 3区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Computers & Electrical Engineering Pub Date : 2025-05-01 DOI:10.1016/j.compeleceng.2025.110359

Kaixin Jin , Xiwen Wang , Lifang Wang , Wei Guo , Qiang Han , Xiaoqing Yu

{"title":"Multimodal medical image fusion based on dilated convolution and attention-based graph convolutional network","authors":"Kaixin Jin , Xiwen Wang , Lifang Wang , Wei Guo , Qiang Han , Xiaoqing Yu","doi":"10.1016/j.compeleceng.2025.110359","DOIUrl":null,"url":null,"abstract":"<div><div>To address the limitations of current deep learning-based multimodal medical image fusion methods, which include insufficient representation of low- and high-level semantic information and weak robustness against noisy data, this paper proposes a multimodal medical image fusion method based on dilated convolution and attention-based graph convolutional networks, termed CGCN-Fusion. The framework comprises three main components: an encoder, a fusion module, and a decoder. The encoder integrates a low-level CNN encoder with a high-level GCN encoder to comprehensively enhance the representation of semantic features. Specifically, the low-level CNN encoder, through the introduction of dilated convolution technology, significantly enhances its ability to represent low-level semantic information, enabling the model to capture fine-grained details of images better. Meanwhile, the high-level GCN encoder leverages the unique strengths of graph convolutional networks to improve robustness against noise while integrating self-attention and multi-head attention mechanisms for a richer high-level semantic representation. The self-attention mechanism captures semantic information of key nodes, while the multi-head attention integrates structural encoding to model spatial-semantic relationships among nodes. The fusion module systematically integrates the extracted features, and the decoder reconstructs the fused image. Through comparisons with nine state-of-the-art methods, CGCN-Fusion demonstrates superior visual quality and objective performance, validating its effectiveness and clinical applicability in multimodal medical image fusion.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"124 ","pages":"Article 110359"},"PeriodicalIF":4.0000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790625003027","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

To address the limitations of current deep learning-based multimodal medical image fusion methods, which include insufficient representation of low- and high-level semantic information and weak robustness against noisy data, this paper proposes a multimodal medical image fusion method based on dilated convolution and attention-based graph convolutional networks, termed CGCN-Fusion. The framework comprises three main components: an encoder, a fusion module, and a decoder. The encoder integrates a low-level CNN encoder with a high-level GCN encoder to comprehensively enhance the representation of semantic features. Specifically, the low-level CNN encoder, through the introduction of dilated convolution technology, significantly enhances its ability to represent low-level semantic information, enabling the model to capture fine-grained details of images better. Meanwhile, the high-level GCN encoder leverages the unique strengths of graph convolutional networks to improve robustness against noise while integrating self-attention and multi-head attention mechanisms for a richer high-level semantic representation. The self-attention mechanism captures semantic information of key nodes, while the multi-head attention integrates structural encoding to model spatial-semantic relationships among nodes. The fusion module systematically integrates the extracted features, and the decoder reconstructs the fused image. Through comparisons with nine state-of-the-art methods, CGCN-Fusion demonstrates superior visual quality and objective performance, validating its effectiveness and clinical applicability in multimodal medical image fusion.

查看原文本刊更多论文

基于扩展卷积和基于注意力的图卷积网络的多模态医学图像融合

针对当前基于深度学习的多模态医学图像融合方法存在的低、高层语义信息表达不足、对噪声数据鲁棒性较弱等局限性，提出了一种基于扩展卷积和基于注意力的图卷积网络的多模态医学图像融合方法，称为CGCN-Fusion。该框架包括三个主要组件：编码器、融合模块和解码器。该编码器集成了低级CNN编码器和高级GCN编码器，全面增强了语义特征的表示。具体来说，底层CNN编码器通过引入扩展卷积技术，显著增强了底层语义信息的表示能力，使模型能够更好地捕捉图像的细粒度细节。同时，高级GCN编码器利用图卷积网络的独特优势来提高对噪声的鲁棒性，同时集成自注意和多头注意机制，以获得更丰富的高级语义表示。自注意机制捕获关键节点的语义信息，多头注意机制结合结构编码对节点间的空间语义关系进行建模。融合模块对提取的特征进行系统集成，解码器对融合后的图像进行重构。通过与9种最先进的图像融合方法的比较，CGCN-Fusion显示出优越的视觉质量和客观性能，验证了其在多模态医学图像融合中的有效性和临床适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Electrical Engineering 工程技术-工程：电子与电气

CiteScore

9.20

自引率

7.00%

发文量

661

审稿时长

47 days

期刊介绍： The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency. Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.