UTSRMorph: A Unified Transformer and Superresolution Network for Unsupervised Medical Image Registration

IEEE transactions on medical imaging Pub Date : 2024-09-25 DOI:10.1109/TMI.2024.3467919

Runshi Zhang;Hao Mo;Junchen Wang;Bimeng Jie;Yang He;Nenghao Jin;Liang Zhu

{"title":"UTSRMorph: A Unified Transformer and Superresolution Network for Unsupervised Medical Image Registration","authors":"Runshi Zhang;Hao Mo;Junchen Wang;Bimeng Jie;Yang He;Nenghao Jin;Liang Zhu","doi":"10.1109/TMI.2024.3467919","DOIUrl":null,"url":null,"abstract":"Complicated image registration is a key issue in medical image analysis, and deep learning-based methods have achieved better results than traditional methods. The methods include ConvNet-based and Transformer-based methods. Although ConvNets can effectively utilize local information to reduce redundancy via small neighborhood convolution, the limited receptive field results in the inability to capture global dependencies. Transformers can establish long-distance dependencies via a self-attention mechanism; however, the intense calculation of the relationships among all tokens leads to high redundancy. We propose a novel unsupervised image registration method named the unified Transformer and superresolution (UTSRMorph) network, which can enhance feature representation learning in the encoder and generate detailed displacement fields in the decoder to overcome these problems. We first propose a fusion attention block to integrate the advantages of ConvNets and Transformers, which inserts a ConvNet-based channel attention module into a multihead self-attention module. The overlapping attention block, a novel cross-attention method, uses overlapping windows to obtain abundant correlations with match information of a pair of images. Then, the blocks are flexibly stacked into a new powerful encoder. The decoder generation process of a high-resolution deformation displacement field from low-resolution features is considered as a superresolution process. Specifically, the superresolution module was employed to replace interpolation upsampling, which can overcome feature degradation. UTSRMorph was compared to state-of-the-art registration methods in the 3D brain MR (OASIS, IXI) and MR-CT datasets (abdomen, craniomaxillofacial). The qualitative and quantitative results indicate that UTSRMorph achieves relatively better performance. The code and datasets are publicly available at <uri>https://github.com/Runshi-Zhang/UTSRMorph</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 2","pages":"891-902"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10693635/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Complicated image registration is a key issue in medical image analysis, and deep learning-based methods have achieved better results than traditional methods. The methods include ConvNet-based and Transformer-based methods. Although ConvNets can effectively utilize local information to reduce redundancy via small neighborhood convolution, the limited receptive field results in the inability to capture global dependencies. Transformers can establish long-distance dependencies via a self-attention mechanism; however, the intense calculation of the relationships among all tokens leads to high redundancy. We propose a novel unsupervised image registration method named the unified Transformer and superresolution (UTSRMorph) network, which can enhance feature representation learning in the encoder and generate detailed displacement fields in the decoder to overcome these problems. We first propose a fusion attention block to integrate the advantages of ConvNets and Transformers, which inserts a ConvNet-based channel attention module into a multihead self-attention module. The overlapping attention block, a novel cross-attention method, uses overlapping windows to obtain abundant correlations with match information of a pair of images. Then, the blocks are flexibly stacked into a new powerful encoder. The decoder generation process of a high-resolution deformation displacement field from low-resolution features is considered as a superresolution process. Specifically, the superresolution module was employed to replace interpolation upsampling, which can overcome feature degradation. UTSRMorph was compared to state-of-the-art registration methods in the 3D brain MR (OASIS, IXI) and MR-CT datasets (abdomen, craniomaxillofacial). The qualitative and quantitative results indicate that UTSRMorph achieves relatively better performance. The code and datasets are publicly available at https://github.com/Runshi-Zhang/UTSRMorph.

查看原文本刊更多论文

UTSRMorph：用于无监督医学图像配准的统一变换器和超分辨率网络

复杂图像配准是医学图像分析中的一个关键问题，基于深度学习的配准方法取得了比传统方法更好的效果。方法包括基于convnet的方法和基于transformer的方法。虽然卷积神经网络可以通过小邻域卷积有效地利用局部信息来减少冗余，但有限的接受域导致无法捕获全局依赖关系。变形金刚可以通过自关注机制建立远距离依赖；然而，对所有令牌之间关系的密集计算导致了高冗余。我们提出了一种新的无监督图像配准方法，称为统一变形和超分辨率（UTSRMorph）网络，该方法可以增强编码器中的特征表示学习，并在解码器中生成详细的位移场来克服这些问题。我们首先提出了一种融合注意模块，将基于convnet的通道注意模块插入到多头自注意模块中，以整合convnet和transformer的优点。重叠注意块是一种新颖的交叉注意方法，它利用重叠窗口来获取一对图像之间丰富的匹配信息。然后，这些块被灵活地堆叠成一个新的强大的编码器。从低分辨率特征生成高分辨率变形位移场的解码器过程被认为是一个超分辨率过程。采用超分辨率模块代替插值上采样，克服了特征退化问题。将UTSRMorph与3D脑MR （OASIS， IXI）和MR- ct数据集（腹部，颅颌面）中的最先进的配准方法进行比较。定性和定量结果表明，UTSRMorph具有较好的性能。代码和数据集可在https://github.com/Runshi-Zhang/UTSRMorph上公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on medical imaging

自引率

0.00%

发文量