多尺度表面视觉转换器。

ArXiv Pub Date : 2024-06-11

Simon Dahan, Logan Z J Williams, Daniel Rueckert, Emma C Robinson

{"title":"多尺度表面视觉转换器。","authors":"Simon Dahan, Logan Z J Williams, Daniel Rueckert, Emma C Robinson","doi":"","DOIUrl":null,"url":null,"abstract":"Surface meshes are a favoured domain for representing structural and functional information on the human cortex, but their complex topology and geometry pose significant challenges for deep learning analysis. While Transformers have excelled as domainagnostic architectures for sequence-to-sequence learning, the quadratic cost of the self-attention operation remains an obstacle for many dense prediction tasks. Inspired by some of the latest advances in hierarchical modelling with vision transformers, we introduce the Multiscale Surface Vision Transformer (MS-SiT) as a backbone architecture for surface deep learning. The self-attention mechanism is applied within local-mesh-windows to allow for high-resolution sampling of the underlying data, while a shifted-window strategy improves the sharing of information between windows. Neighbouring patches are successively merged, allowing the MS-SiT to learn hierarchical representations suitable for any prediction task. Results demonstrate that the MS-SiT outperforms existing surface deep learning methods for neonatal phenotyping prediction tasks using the Developing Human Connectome Project (dHCP) dataset. Furthermore, building the MS-SiT backbone into a U-shaped architecture for surface segmentation demonstrates competitive results on cortical parcellation using the UK Biobank (UKB) and manually-annotated MindBoggle datasets. Code and trained models are publicly available at https://github.com/metrics-lab/surface-vision-transformers.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055498/pdf/","citationCount":"0","resultStr":"{\"title\":\"The Multiscale Surface Vision Transformer.\",\"authors\":\"Simon Dahan, Logan Z J Williams, Daniel Rueckert, Emma C Robinson\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Surface meshes are a favoured domain for representing structural and functional information on the human cortex, but their complex topology and geometry pose significant challenges for deep learning analysis. While Transformers have excelled as domainagnostic architectures for sequence-to-sequence learning, the quadratic cost of the self-attention operation remains an obstacle for many dense prediction tasks. Inspired by some of the latest advances in hierarchical modelling with vision transformers, we introduce the Multiscale Surface Vision Transformer (MS-SiT) as a backbone architecture for surface deep learning. The self-attention mechanism is applied within local-mesh-windows to allow for high-resolution sampling of the underlying data, while a shifted-window strategy improves the sharing of information between windows. Neighbouring patches are successively merged, allowing the MS-SiT to learn hierarchical representations suitable for any prediction task. Results demonstrate that the MS-SiT outperforms existing surface deep learning methods for neonatal phenotyping prediction tasks using the Developing Human Connectome Project (dHCP) dataset. Furthermore, building the MS-SiT backbone into a U-shaped architecture for surface segmentation demonstrates competitive results on cortical parcellation using the UK Biobank (UKB) and manually-annotated MindBoggle datasets. Code and trained models are publicly available at https://github.com/metrics-lab/surface-vision-transformers.\",\"PeriodicalId\":8425,\"journal\":{\"name\":\"ArXiv\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055498/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ArXiv\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

表面网格是表示人类皮层结构和功能信息的一个受欢迎的领域，但其复杂的拓扑结构和几何结构对深度学习分析提出了重大挑战。虽然Transformers在序列到序列学习的领域不可知架构方面表现出色，尤其是在卷积运算的转换不是平凡的结构中，但自注意运算的二次代价仍然是许多密集预测任务的障碍。受视觉转换器分层建模的一些最新进展的启发，我们引入了多尺度表面视觉转换器（MS-SiT）作为表面深度学习的主干架构。自注意机制应用于局部网格窗口中，以允许对底层数据进行高分辨率采样，而移位窗口策略则改善了窗口之间的信息共享。相邻的补丁被连续地合并，从而允许MS-SiT学习适用于任何预测任务的分层表示。结果表明，在使用开发人类连接体项目（dHCP）数据集进行新生儿表型预测任务方面，MS-SiT优于现有的表面深度学习方法。此外，使用英国生物库（UKB）和手动注释的MindBoggle数据集，将MS-SiT主干构建成用于表面分割的U形架构，证明了皮层分割的竞争结果。代码和经过训练的模型可在https://github.com/metrics-lab/surface-vision-transformers.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

The Multiscale Surface Vision Transformer.

本刊更多论文

The Multiscale Surface Vision Transformer.

Surface meshes are a favoured domain for representing structural and functional information on the human cortex, but their complex topology and geometry pose significant challenges for deep learning analysis. While Transformers have excelled as domainagnostic architectures for sequence-to-sequence learning, the quadratic cost of the self-attention operation remains an obstacle for many dense prediction tasks. Inspired by some of the latest advances in hierarchical modelling with vision transformers, we introduce the Multiscale Surface Vision Transformer (MS-SiT) as a backbone architecture for surface deep learning. The self-attention mechanism is applied within local-mesh-windows to allow for high-resolution sampling of the underlying data, while a shifted-window strategy improves the sharing of information between windows. Neighbouring patches are successively merged, allowing the MS-SiT to learn hierarchical representations suitable for any prediction task. Results demonstrate that the MS-SiT outperforms existing surface deep learning methods for neonatal phenotyping prediction tasks using the Developing Human Connectome Project (dHCP) dataset. Furthermore, building the MS-SiT backbone into a U-shaped architecture for surface segmentation demonstrates competitive results on cortical parcellation using the UK Biobank (UKB) and manually-annotated MindBoggle datasets. Code and trained models are publicly available at https://github.com/metrics-lab/surface-vision-transformers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ArXiv

自引率

0.00%

发文量