Dual-teacher self-distillation registration for multi-modality medical image fusion

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-09-02 DOI:10.1016/j.patcog.2025.112373

Aimei Dong , Jingyuan Xu , Long Wang

{"title":"Dual-teacher self-distillation registration for multi-modality medical image fusion","authors":"Aimei Dong , Jingyuan Xu , Long Wang","doi":"10.1016/j.patcog.2025.112373","DOIUrl":null,"url":null,"abstract":"<div><div>Misaligned multimodal medical images pose challenges to the fusion task, resulting in structural distortions and edge artifacts in the fusion results. Existing registration networks primarily consider single-scale deformation fields at each stage, thereby neglecting long-range connections between non-adjacent stages. Moreover, in the fusion task, due to the quadratic computational complexity faced by Transformers during feature extraction, they are unable to effectively capture long-range correlated features. To address these problems, we propose an image registration and fusion method called DTMFusion. DTMFusion comprises two main networks: a Dual-Teacher Self-Distillation Registration (DTSDR) network and a Mamba-Conv-based Fusion (MCF) network. The registration network employs a pyramid progressive architecture to generate independent deformation fields at each layer. We introduce a dual-teacher self-distillation scheme that leverages past learning history and the current network structure as teacher guidance to constrain the generated deformation fields. For the fusion network, we introduced Mamba to address the quadratic complexity problem of Transformers. Specifically, the fusion network involves two key components: the Shallow Fusion Module (SFM) and the Cross-Modality Fusion Module (CFM). The SFM achieves lightweight cross-modality interaction through channel exchange, while the CFM leverages inherent cross-modality relationships to enhance the representation capability of fusion results. Through the collaborative effort of these components, the network can effectively integrate cross-modality complementary information and maintain appropriate apparent strength from a global perspective. Extensive experimental analysis demonstrates the superiority of this method in fusing misaligned medical images.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112373"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325010349","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Misaligned multimodal medical images pose challenges to the fusion task, resulting in structural distortions and edge artifacts in the fusion results. Existing registration networks primarily consider single-scale deformation fields at each stage, thereby neglecting long-range connections between non-adjacent stages. Moreover, in the fusion task, due to the quadratic computational complexity faced by Transformers during feature extraction, they are unable to effectively capture long-range correlated features. To address these problems, we propose an image registration and fusion method called DTMFusion. DTMFusion comprises two main networks: a Dual-Teacher Self-Distillation Registration (DTSDR) network and a Mamba-Conv-based Fusion (MCF) network. The registration network employs a pyramid progressive architecture to generate independent deformation fields at each layer. We introduce a dual-teacher self-distillation scheme that leverages past learning history and the current network structure as teacher guidance to constrain the generated deformation fields. For the fusion network, we introduced Mamba to address the quadratic complexity problem of Transformers. Specifically, the fusion network involves two key components: the Shallow Fusion Module (SFM) and the Cross-Modality Fusion Module (CFM). The SFM achieves lightweight cross-modality interaction through channel exchange, while the CFM leverages inherent cross-modality relationships to enhance the representation capability of fusion results. Through the collaborative effort of these components, the network can effectively integrate cross-modality complementary information and maintain appropriate apparent strength from a global perspective. Extensive experimental analysis demonstrates the superiority of this method in fusing misaligned medical images.

查看原文本刊更多论文

多模态医学图像融合的双师自蒸馏配准

多模态医学图像的不对齐给融合工作带来了挑战，导致融合结果中存在结构畸变和边缘伪影。现有的配准网络主要考虑每个阶段的单尺度变形场，从而忽略了非相邻阶段之间的长期联系。此外，在融合任务中，由于变压器在特征提取过程中所面临的二次计算复杂度，它们无法有效地捕获远程相关特征。为了解决这些问题，我们提出了一种称为DTMFusion的图像配准和融合方法。DTMFusion包括两个主要网络：双教师自蒸馏注册（DTSDR）网络和基于曼巴卷积的融合（MCF）网络。该配准网络采用金字塔递进结构，在每一层生成独立的形变场。我们引入了一种双教师自蒸馏方案，利用过去的学习历史和当前的网络结构作为教师指导来约束生成的变形场。对于融合网络，我们引入Mamba来解决变形金刚的二次复杂度问题。具体来说，融合网络包括两个关键组件：浅融合模块（SFM）和跨模态融合模块（CFM）。SFM通过信道交换实现轻量级的跨模态交互，CFM利用固有的跨模态关系增强融合结果的表示能力。通过这些组成部分的协同努力，网络可以有效地整合跨模态的互补信息，并从全球角度保持适当的表观强度。大量的实验分析证明了该方法在医学图像融合方面的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.