{"title":"Dual-teacher self-distillation registration for multi-modality medical image fusion","authors":"Aimei Dong , Jingyuan Xu , Long Wang","doi":"10.1016/j.patcog.2025.112373","DOIUrl":null,"url":null,"abstract":"<div><div>Misaligned multimodal medical images pose challenges to the fusion task, resulting in structural distortions and edge artifacts in the fusion results. Existing registration networks primarily consider single-scale deformation fields at each stage, thereby neglecting long-range connections between non-adjacent stages. Moreover, in the fusion task, due to the quadratic computational complexity faced by Transformers during feature extraction, they are unable to effectively capture long-range correlated features. To address these problems, we propose an image registration and fusion method called DTMFusion. DTMFusion comprises two main networks: a Dual-Teacher Self-Distillation Registration (DTSDR) network and a Mamba-Conv-based Fusion (MCF) network. The registration network employs a pyramid progressive architecture to generate independent deformation fields at each layer. We introduce a dual-teacher self-distillation scheme that leverages past learning history and the current network structure as teacher guidance to constrain the generated deformation fields. For the fusion network, we introduced Mamba to address the quadratic complexity problem of Transformers. Specifically, the fusion network involves two key components: the Shallow Fusion Module (SFM) and the Cross-Modality Fusion Module (CFM). The SFM achieves lightweight cross-modality interaction through channel exchange, while the CFM leverages inherent cross-modality relationships to enhance the representation capability of fusion results. Through the collaborative effort of these components, the network can effectively integrate cross-modality complementary information and maintain appropriate apparent strength from a global perspective. Extensive experimental analysis demonstrates the superiority of this method in fusing misaligned medical images.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112373"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325010349","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Misaligned multimodal medical images pose challenges to the fusion task, resulting in structural distortions and edge artifacts in the fusion results. Existing registration networks primarily consider single-scale deformation fields at each stage, thereby neglecting long-range connections between non-adjacent stages. Moreover, in the fusion task, due to the quadratic computational complexity faced by Transformers during feature extraction, they are unable to effectively capture long-range correlated features. To address these problems, we propose an image registration and fusion method called DTMFusion. DTMFusion comprises two main networks: a Dual-Teacher Self-Distillation Registration (DTSDR) network and a Mamba-Conv-based Fusion (MCF) network. The registration network employs a pyramid progressive architecture to generate independent deformation fields at each layer. We introduce a dual-teacher self-distillation scheme that leverages past learning history and the current network structure as teacher guidance to constrain the generated deformation fields. For the fusion network, we introduced Mamba to address the quadratic complexity problem of Transformers. Specifically, the fusion network involves two key components: the Shallow Fusion Module (SFM) and the Cross-Modality Fusion Module (CFM). The SFM achieves lightweight cross-modality interaction through channel exchange, while the CFM leverages inherent cross-modality relationships to enhance the representation capability of fusion results. Through the collaborative effort of these components, the network can effectively integrate cross-modality complementary information and maintain appropriate apparent strength from a global perspective. Extensive experimental analysis demonstrates the superiority of this method in fusing misaligned medical images.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.