基于卷积视觉变换网络的多分辨率差分图像配准

Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things Pub Date : 2023-05-26 DOI:10.1145/3603781.3603849

Tao Xu, Ting Jiang, Haoyang Xing, Xiaoning Li

{"title":"基于卷积视觉变换网络的多分辨率差分图像配准","authors":"Tao Xu, Ting Jiang, Haoyang Xing, Xiaoning Li","doi":"10.1145/3603781.3603849","DOIUrl":null,"url":null,"abstract":"In recent years, the research of medical image registration based on convolutional neural network (CNN) has attracted much attention. In particular, the deformable image registration method based on diffeomorphism seems to have achieved promising results due to its unique topology conservation and transformation reversibility. However, the results of most existing learning-based approaches are not necessarily diffeomorphic. Moreover, due to local receptive fields caused by convolutional inductive bias, CNNs usually have limitations in catching the global and remote spatial relationships between points in anatomical images. Vision Transformer (ViT) shows tremendous advantages in modeling long-term dependencies in sequential images due to its embedded self-attention mechanism. Therefore, we propose a hybrid convolution Vision Transformer Network (CViT) model based on multi-resolution diffeomorphism. The model employs a multi-resolution strategy to learn global connectivity and local context of medical images in the diffeomorphic mapping space, which can simultaneously integrate the advantages of CNN and ViT to provide a better understanding of spatial correspondence. We evaluate our approach respectively on a large scale and a small scale dataset of 3D brain MRI scans, gaining an average Dice of 0.813 on the OASIS dataset. Extensive quantitative and qualitative results show that our method achieves state-of-the-art performance while maintaining desirable diffeomorphism.","PeriodicalId":391180,"journal":{"name":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Resolution Diffeomorphic Image Registration with Convolutional Vision Transformer Network\",\"authors\":\"Tao Xu, Ting Jiang, Haoyang Xing, Xiaoning Li\",\"doi\":\"10.1145/3603781.3603849\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the research of medical image registration based on convolutional neural network (CNN) has attracted much attention. In particular, the deformable image registration method based on diffeomorphism seems to have achieved promising results due to its unique topology conservation and transformation reversibility. However, the results of most existing learning-based approaches are not necessarily diffeomorphic. Moreover, due to local receptive fields caused by convolutional inductive bias, CNNs usually have limitations in catching the global and remote spatial relationships between points in anatomical images. Vision Transformer (ViT) shows tremendous advantages in modeling long-term dependencies in sequential images due to its embedded self-attention mechanism. Therefore, we propose a hybrid convolution Vision Transformer Network (CViT) model based on multi-resolution diffeomorphism. The model employs a multi-resolution strategy to learn global connectivity and local context of medical images in the diffeomorphic mapping space, which can simultaneously integrate the advantages of CNN and ViT to provide a better understanding of spatial correspondence. We evaluate our approach respectively on a large scale and a small scale dataset of 3D brain MRI scans, gaining an average Dice of 0.813 on the OASIS dataset. Extensive quantitative and qualitative results show that our method achieves state-of-the-art performance while maintaining desirable diffeomorphism.\",\"PeriodicalId\":391180,\"journal\":{\"name\":\"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3603781.3603849\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3603781.3603849","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，基于卷积神经网络(CNN)的医学图像配准研究备受关注。特别是基于差分同态的可变形图像配准方法，由于其独特的拓扑守恒性和变换可逆性，似乎取得了很好的效果。然而，大多数现有的基于学习的方法的结果并不一定是微分同构的。此外，由于卷积归纳偏置引起的局部感受野，cnn在捕捉解剖图像中点之间的全局和远程空间关系方面通常存在局限性。视觉转换器(Vision Transformer, ViT)由于其嵌入的自注意机制，在序列图像的长期依赖关系建模方面显示出巨大的优势。因此，我们提出了一种基于多分辨率差分同构的混合卷积视觉变压器网络(cvt)模型。该模型采用多分辨率策略学习医学图像在微分同构映射空间中的全局连通性和局部上下文，可以同时融合CNN和ViT的优点，更好地理解空间对应关系。我们分别在大规模和小规模的3D脑MRI扫描数据集上对我们的方法进行了评估，在OASIS数据集上获得了0.813的平均Dice。广泛的定量和定性结果表明，我们的方法在保持理想的微分同态的同时实现了最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-Resolution Diffeomorphic Image Registration with Convolutional Vision Transformer Network

In recent years, the research of medical image registration based on convolutional neural network (CNN) has attracted much attention. In particular, the deformable image registration method based on diffeomorphism seems to have achieved promising results due to its unique topology conservation and transformation reversibility. However, the results of most existing learning-based approaches are not necessarily diffeomorphic. Moreover, due to local receptive fields caused by convolutional inductive bias, CNNs usually have limitations in catching the global and remote spatial relationships between points in anatomical images. Vision Transformer (ViT) shows tremendous advantages in modeling long-term dependencies in sequential images due to its embedded self-attention mechanism. Therefore, we propose a hybrid convolution Vision Transformer Network (CViT) model based on multi-resolution diffeomorphism. The model employs a multi-resolution strategy to learn global connectivity and local context of medical images in the diffeomorphic mapping space, which can simultaneously integrate the advantages of CNN and ViT to provide a better understanding of spatial correspondence. We evaluate our approach respectively on a large scale and a small scale dataset of 3D brain MRI scans, gaining an average Dice of 0.813 on the OASIS dataset. Extensive quantitative and qualitative results show that our method achieves state-of-the-art performance while maintaining desirable diffeomorphism.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things

自引率

0.00%

发文量