{"title":"基于卷积视觉变换网络的多分辨率差分图像配准","authors":"Tao Xu, Ting Jiang, Haoyang Xing, Xiaoning Li","doi":"10.1145/3603781.3603849","DOIUrl":null,"url":null,"abstract":"In recent years, the research of medical image registration based on convolutional neural network (CNN) has attracted much attention. In particular, the deformable image registration method based on diffeomorphism seems to have achieved promising results due to its unique topology conservation and transformation reversibility. However, the results of most existing learning-based approaches are not necessarily diffeomorphic. Moreover, due to local receptive fields caused by convolutional inductive bias, CNNs usually have limitations in catching the global and remote spatial relationships between points in anatomical images. Vision Transformer (ViT) shows tremendous advantages in modeling long-term dependencies in sequential images due to its embedded self-attention mechanism. Therefore, we propose a hybrid convolution Vision Transformer Network (CViT) model based on multi-resolution diffeomorphism. The model employs a multi-resolution strategy to learn global connectivity and local context of medical images in the diffeomorphic mapping space, which can simultaneously integrate the advantages of CNN and ViT to provide a better understanding of spatial correspondence. We evaluate our approach respectively on a large scale and a small scale dataset of 3D brain MRI scans, gaining an average Dice of 0.813 on the OASIS dataset. Extensive quantitative and qualitative results show that our method achieves state-of-the-art performance while maintaining desirable diffeomorphism.","PeriodicalId":391180,"journal":{"name":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Resolution Diffeomorphic Image Registration with Convolutional Vision Transformer Network\",\"authors\":\"Tao Xu, Ting Jiang, Haoyang Xing, Xiaoning Li\",\"doi\":\"10.1145/3603781.3603849\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the research of medical image registration based on convolutional neural network (CNN) has attracted much attention. In particular, the deformable image registration method based on diffeomorphism seems to have achieved promising results due to its unique topology conservation and transformation reversibility. However, the results of most existing learning-based approaches are not necessarily diffeomorphic. Moreover, due to local receptive fields caused by convolutional inductive bias, CNNs usually have limitations in catching the global and remote spatial relationships between points in anatomical images. Vision Transformer (ViT) shows tremendous advantages in modeling long-term dependencies in sequential images due to its embedded self-attention mechanism. Therefore, we propose a hybrid convolution Vision Transformer Network (CViT) model based on multi-resolution diffeomorphism. The model employs a multi-resolution strategy to learn global connectivity and local context of medical images in the diffeomorphic mapping space, which can simultaneously integrate the advantages of CNN and ViT to provide a better understanding of spatial correspondence. We evaluate our approach respectively on a large scale and a small scale dataset of 3D brain MRI scans, gaining an average Dice of 0.813 on the OASIS dataset. Extensive quantitative and qualitative results show that our method achieves state-of-the-art performance while maintaining desirable diffeomorphism.\",\"PeriodicalId\":391180,\"journal\":{\"name\":\"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3603781.3603849\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3603781.3603849","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-Resolution Diffeomorphic Image Registration with Convolutional Vision Transformer Network
In recent years, the research of medical image registration based on convolutional neural network (CNN) has attracted much attention. In particular, the deformable image registration method based on diffeomorphism seems to have achieved promising results due to its unique topology conservation and transformation reversibility. However, the results of most existing learning-based approaches are not necessarily diffeomorphic. Moreover, due to local receptive fields caused by convolutional inductive bias, CNNs usually have limitations in catching the global and remote spatial relationships between points in anatomical images. Vision Transformer (ViT) shows tremendous advantages in modeling long-term dependencies in sequential images due to its embedded self-attention mechanism. Therefore, we propose a hybrid convolution Vision Transformer Network (CViT) model based on multi-resolution diffeomorphism. The model employs a multi-resolution strategy to learn global connectivity and local context of medical images in the diffeomorphic mapping space, which can simultaneously integrate the advantages of CNN and ViT to provide a better understanding of spatial correspondence. We evaluate our approach respectively on a large scale and a small scale dataset of 3D brain MRI scans, gaining an average Dice of 0.813 on the OASIS dataset. Extensive quantitative and qualitative results show that our method achieves state-of-the-art performance while maintaining desirable diffeomorphism.