Transformer-based end-to-end classification of variable-length volumetric data

Marzieh Oghbaie, Teresa Araújo, T. Emre, U. Schmidt-Erfurth, H. Bogunović
{"title":"Transformer-based end-to-end classification of variable-length volumetric data","authors":"Marzieh Oghbaie, Teresa Araújo, T. Emre, U. Schmidt-Erfurth, H. Bogunović","doi":"10.48550/arXiv.2307.06666","DOIUrl":null,"url":null,"abstract":"The automatic classification of 3D medical data is memory-intensive. Also, variations in the number of slices between samples is common. Na\\\"ive solutions such as subsampling can solve these problems, but at the cost of potentially eliminating relevant diagnosis information. Transformers have shown promising performance for sequential data analysis. However, their application for long sequences is data, computationally, and memory demanding. In this paper, we propose an end-to-end Transformer-based framework that allows to classify volumetric data of variable length in an efficient fashion. Particularly, by randomizing the input volume-wise resolution(#slices) during training, we enhance the capacity of the learnable positional embedding assigned to each volume slice. Consequently, the accumulated positional information in each positional embedding can be generalized to the neighbouring slices, even for high-resolution volumes at the test time. By doing so, the model will be more robust to variable volume length and amenable to different computational budgets. We evaluated the proposed approach in retinal OCT volume classification and achieved 21.96% average improvement in balanced accuracy on a 9-class diagnostic task, compared to state-of-the-art video transformers. Our findings show that varying the volume-wise resolution of the input during training results in more informative volume representation as compared to training with fixed number of slices per volume.","PeriodicalId":18289,"journal":{"name":"Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention","volume":"75 1","pages":"358-367"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2307.06666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The automatic classification of 3D medical data is memory-intensive. Also, variations in the number of slices between samples is common. Na\"ive solutions such as subsampling can solve these problems, but at the cost of potentially eliminating relevant diagnosis information. Transformers have shown promising performance for sequential data analysis. However, their application for long sequences is data, computationally, and memory demanding. In this paper, we propose an end-to-end Transformer-based framework that allows to classify volumetric data of variable length in an efficient fashion. Particularly, by randomizing the input volume-wise resolution(#slices) during training, we enhance the capacity of the learnable positional embedding assigned to each volume slice. Consequently, the accumulated positional information in each positional embedding can be generalized to the neighbouring slices, even for high-resolution volumes at the test time. By doing so, the model will be more robust to variable volume length and amenable to different computational budgets. We evaluated the proposed approach in retinal OCT volume classification and achieved 21.96% average improvement in balanced accuracy on a 9-class diagnostic task, compared to state-of-the-art video transformers. Our findings show that varying the volume-wise resolution of the input during training results in more informative volume representation as compared to training with fixed number of slices per volume.
基于变压器的端到端变长体积数据分类
3D医疗数据的自动分类是内存密集型的。此外,样本之间切片数量的变化是常见的。诸如子采样之类的低级解决方案可以解决这些问题,但代价是可能会消除相关的诊断信息。变压器在序列数据分析方面表现出了良好的性能。然而,它们对于长序列的应用对数据、计算和内存的要求很高。在本文中,我们提出了一个端到端基于变压器的框架,该框架允许以有效的方式对可变长度的体积数据进行分类。特别是,通过在训练过程中随机化输入体分辨率(#slices),我们增强了分配给每个体片的可学习位置嵌入的容量。因此,在每个位置嵌入中积累的位置信息可以推广到邻近的切片,即使在测试时对于高分辨率的体积也是如此。通过这样做,模型将对可变体积长度和不同的计算预算具有更强的鲁棒性。我们在视网膜OCT体积分类中评估了所提出的方法,与最先进的视频变压器相比,在9级诊断任务中,平衡精度平均提高了21.96%。我们的研究结果表明,在训练期间改变输入的体积方向分辨率,与每个体积固定数量的切片训练相比,可以获得更有信息的体积表示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信