Transformer-based end-to-end classification of variable-length volumetric data

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention Pub Date : 2023-07-13 DOI:10.48550/arXiv.2307.06666

Marzieh Oghbaie, Teresa Araújo, T. Emre, U. Schmidt-Erfurth, H. Bogunović

{"title":"Transformer-based end-to-end classification of variable-length volumetric data","authors":"Marzieh Oghbaie, Teresa Araújo, T. Emre, U. Schmidt-Erfurth, H. Bogunović","doi":"10.48550/arXiv.2307.06666","DOIUrl":null,"url":null,"abstract":"The automatic classification of 3D medical data is memory-intensive. Also, variations in the number of slices between samples is common. Na\\\"ive solutions such as subsampling can solve these problems, but at the cost of potentially eliminating relevant diagnosis information. Transformers have shown promising performance for sequential data analysis. However, their application for long sequences is data, computationally, and memory demanding. In this paper, we propose an end-to-end Transformer-based framework that allows to classify volumetric data of variable length in an efficient fashion. Particularly, by randomizing the input volume-wise resolution(#slices) during training, we enhance the capacity of the learnable positional embedding assigned to each volume slice. Consequently, the accumulated positional information in each positional embedding can be generalized to the neighbouring slices, even for high-resolution volumes at the test time. By doing so, the model will be more robust to variable volume length and amenable to different computational budgets. We evaluated the proposed approach in retinal OCT volume classification and achieved 21.96% average improvement in balanced accuracy on a 9-class diagnostic task, compared to state-of-the-art video transformers. Our findings show that varying the volume-wise resolution of the input during training results in more informative volume representation as compared to training with fixed number of slices per volume.","PeriodicalId":18289,"journal":{"name":"Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention","volume":"75 1","pages":"358-367"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2307.06666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The automatic classification of 3D medical data is memory-intensive. Also, variations in the number of slices between samples is common. Na\"ive solutions such as subsampling can solve these problems, but at the cost of potentially eliminating relevant diagnosis information. Transformers have shown promising performance for sequential data analysis. However, their application for long sequences is data, computationally, and memory demanding. In this paper, we propose an end-to-end Transformer-based framework that allows to classify volumetric data of variable length in an efficient fashion. Particularly, by randomizing the input volume-wise resolution(#slices) during training, we enhance the capacity of the learnable positional embedding assigned to each volume slice. Consequently, the accumulated positional information in each positional embedding can be generalized to the neighbouring slices, even for high-resolution volumes at the test time. By doing so, the model will be more robust to variable volume length and amenable to different computational budgets. We evaluated the proposed approach in retinal OCT volume classification and achieved 21.96% average improvement in balanced accuracy on a 9-class diagnostic task, compared to state-of-the-art video transformers. Our findings show that varying the volume-wise resolution of the input during training results in more informative volume representation as compared to training with fixed number of slices per volume.

查看原文本刊更多论文

基于变压器的端到端变长体积数据分类

3D医疗数据的自动分类是内存密集型的。此外，样本之间切片数量的变化是常见的。诸如子采样之类的低级解决方案可以解决这些问题，但代价是可能会消除相关的诊断信息。变压器在序列数据分析方面表现出了良好的性能。然而，它们对于长序列的应用对数据、计算和内存的要求很高。在本文中，我们提出了一个端到端基于变压器的框架，该框架允许以有效的方式对可变长度的体积数据进行分类。特别是，通过在训练过程中随机化输入体分辨率(#slices)，我们增强了分配给每个体片的可学习位置嵌入的容量。因此，在每个位置嵌入中积累的位置信息可以推广到邻近的切片，即使在测试时对于高分辨率的体积也是如此。通过这样做，模型将对可变体积长度和不同的计算预算具有更强的鲁棒性。我们在视网膜OCT体积分类中评估了所提出的方法，与最先进的视频变压器相比，在9级诊断任务中，平衡精度平均提高了21.96%。我们的研究结果表明，在训练期间改变输入的体积方向分辨率，与每个体积固定数量的切片训练相比，可以获得更有信息的体积表示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention

自引率

0.00%

发文量