基于卷积视觉变换网络的多分辨率差分图像配准

Tao Xu, Ting Jiang, Haoyang Xing, Xiaoning Li
{"title":"基于卷积视觉变换网络的多分辨率差分图像配准","authors":"Tao Xu, Ting Jiang, Haoyang Xing, Xiaoning Li","doi":"10.1145/3603781.3603849","DOIUrl":null,"url":null,"abstract":"In recent years, the research of medical image registration based on convolutional neural network (CNN) has attracted much attention. In particular, the deformable image registration method based on diffeomorphism seems to have achieved promising results due to its unique topology conservation and transformation reversibility. However, the results of most existing learning-based approaches are not necessarily diffeomorphic. Moreover, due to local receptive fields caused by convolutional inductive bias, CNNs usually have limitations in catching the global and remote spatial relationships between points in anatomical images. Vision Transformer (ViT) shows tremendous advantages in modeling long-term dependencies in sequential images due to its embedded self-attention mechanism. Therefore, we propose a hybrid convolution Vision Transformer Network (CViT) model based on multi-resolution diffeomorphism. The model employs a multi-resolution strategy to learn global connectivity and local context of medical images in the diffeomorphic mapping space, which can simultaneously integrate the advantages of CNN and ViT to provide a better understanding of spatial correspondence. We evaluate our approach respectively on a large scale and a small scale dataset of 3D brain MRI scans, gaining an average Dice of 0.813 on the OASIS dataset. Extensive quantitative and qualitative results show that our method achieves state-of-the-art performance while maintaining desirable diffeomorphism.","PeriodicalId":391180,"journal":{"name":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Resolution Diffeomorphic Image Registration with Convolutional Vision Transformer Network\",\"authors\":\"Tao Xu, Ting Jiang, Haoyang Xing, Xiaoning Li\",\"doi\":\"10.1145/3603781.3603849\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the research of medical image registration based on convolutional neural network (CNN) has attracted much attention. In particular, the deformable image registration method based on diffeomorphism seems to have achieved promising results due to its unique topology conservation and transformation reversibility. However, the results of most existing learning-based approaches are not necessarily diffeomorphic. Moreover, due to local receptive fields caused by convolutional inductive bias, CNNs usually have limitations in catching the global and remote spatial relationships between points in anatomical images. Vision Transformer (ViT) shows tremendous advantages in modeling long-term dependencies in sequential images due to its embedded self-attention mechanism. Therefore, we propose a hybrid convolution Vision Transformer Network (CViT) model based on multi-resolution diffeomorphism. The model employs a multi-resolution strategy to learn global connectivity and local context of medical images in the diffeomorphic mapping space, which can simultaneously integrate the advantages of CNN and ViT to provide a better understanding of spatial correspondence. We evaluate our approach respectively on a large scale and a small scale dataset of 3D brain MRI scans, gaining an average Dice of 0.813 on the OASIS dataset. Extensive quantitative and qualitative results show that our method achieves state-of-the-art performance while maintaining desirable diffeomorphism.\",\"PeriodicalId\":391180,\"journal\":{\"name\":\"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3603781.3603849\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3603781.3603849","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,基于卷积神经网络(CNN)的医学图像配准研究备受关注。特别是基于差分同态的可变形图像配准方法,由于其独特的拓扑守恒性和变换可逆性,似乎取得了很好的效果。然而,大多数现有的基于学习的方法的结果并不一定是微分同构的。此外,由于卷积归纳偏置引起的局部感受野,cnn在捕捉解剖图像中点之间的全局和远程空间关系方面通常存在局限性。视觉转换器(Vision Transformer, ViT)由于其嵌入的自注意机制,在序列图像的长期依赖关系建模方面显示出巨大的优势。因此,我们提出了一种基于多分辨率差分同构的混合卷积视觉变压器网络(cvt)模型。该模型采用多分辨率策略学习医学图像在微分同构映射空间中的全局连通性和局部上下文,可以同时融合CNN和ViT的优点,更好地理解空间对应关系。我们分别在大规模和小规模的3D脑MRI扫描数据集上对我们的方法进行了评估,在OASIS数据集上获得了0.813的平均Dice。广泛的定量和定性结果表明,我们的方法在保持理想的微分同态的同时实现了最先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-Resolution Diffeomorphic Image Registration with Convolutional Vision Transformer Network
In recent years, the research of medical image registration based on convolutional neural network (CNN) has attracted much attention. In particular, the deformable image registration method based on diffeomorphism seems to have achieved promising results due to its unique topology conservation and transformation reversibility. However, the results of most existing learning-based approaches are not necessarily diffeomorphic. Moreover, due to local receptive fields caused by convolutional inductive bias, CNNs usually have limitations in catching the global and remote spatial relationships between points in anatomical images. Vision Transformer (ViT) shows tremendous advantages in modeling long-term dependencies in sequential images due to its embedded self-attention mechanism. Therefore, we propose a hybrid convolution Vision Transformer Network (CViT) model based on multi-resolution diffeomorphism. The model employs a multi-resolution strategy to learn global connectivity and local context of medical images in the diffeomorphic mapping space, which can simultaneously integrate the advantages of CNN and ViT to provide a better understanding of spatial correspondence. We evaluate our approach respectively on a large scale and a small scale dataset of 3D brain MRI scans, gaining an average Dice of 0.813 on the OASIS dataset. Extensive quantitative and qualitative results show that our method achieves state-of-the-art performance while maintaining desirable diffeomorphism.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信