vox2vec:医学图像体素级表示的自监督对比学习框架

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention Pub Date : 2023-07-27 DOI:10.48550/arXiv.2307.14725

M. Goncharov, Vera Soboleva, Anvar Kurmukov, M. Pisov, M. Belyaev

{"title":"vox2vec:医学图像体素级表示的自监督对比学习框架","authors":"M. Goncharov, Vera Soboleva, Anvar Kurmukov, M. Pisov, M. Belyaev","doi":"10.48550/arXiv.2307.14725","DOIUrl":null,"url":null,"abstract":"This paper introduces vox2vec - a contrastive method for self-supervised learning (SSL) of voxel-level representations. vox2vec representations are modeled by a Feature Pyramid Network (FPN): a voxel representation is a concatenation of the corresponding feature vectors from different pyramid levels. The FPN is pre-trained to produce similar representations for the same voxel in different augmented contexts and distinctive representations for different voxels. This results in unified multi-scale representations that capture both global semantics (e.g., body part) and local semantics (e.g., different small organs or healthy versus tumor tissue). We use vox2vec to pre-train a FPN on more than 6500 publicly available computed tomography images. We evaluate the pre-trained representations by attaching simple heads on top of them and training the resulting models for 22 segmentation tasks. We show that vox2vec outperforms existing medical imaging SSL techniques in three evaluation setups: linear and non-linear probing and end-to-end fine-tuning. Moreover, a non-linear head trained on top of the frozen vox2vec representations achieves competitive performance with the FPN trained from scratch while having 50 times fewer trainable parameters. The code is available at https://github.com/mishgon/vox2vec .","PeriodicalId":18289,"journal":{"name":"Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention","volume":"87 1","pages":"605-614"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"vox2vec: A Framework for Self-supervised Contrastive Learning of Voxel-level Representations in Medical Images\",\"authors\":\"M. Goncharov, Vera Soboleva, Anvar Kurmukov, M. Pisov, M. Belyaev\",\"doi\":\"10.48550/arXiv.2307.14725\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper introduces vox2vec - a contrastive method for self-supervised learning (SSL) of voxel-level representations. vox2vec representations are modeled by a Feature Pyramid Network (FPN): a voxel representation is a concatenation of the corresponding feature vectors from different pyramid levels. The FPN is pre-trained to produce similar representations for the same voxel in different augmented contexts and distinctive representations for different voxels. This results in unified multi-scale representations that capture both global semantics (e.g., body part) and local semantics (e.g., different small organs or healthy versus tumor tissue). We use vox2vec to pre-train a FPN on more than 6500 publicly available computed tomography images. We evaluate the pre-trained representations by attaching simple heads on top of them and training the resulting models for 22 segmentation tasks. We show that vox2vec outperforms existing medical imaging SSL techniques in three evaluation setups: linear and non-linear probing and end-to-end fine-tuning. Moreover, a non-linear head trained on top of the frozen vox2vec representations achieves competitive performance with the FPN trained from scratch while having 50 times fewer trainable parameters. The code is available at https://github.com/mishgon/vox2vec .\",\"PeriodicalId\":18289,\"journal\":{\"name\":\"Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention\",\"volume\":\"87 1\",\"pages\":\"605-614\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2307.14725\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2307.14725","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文介绍了vox2vec——一种体素级表示的自监督学习(SSL)对比方法。vox2vec表示由特征金字塔网络(FPN)建模:体素表示是来自不同金字塔级别的相应特征向量的连接。FPN经过预训练，可以在不同的增强环境中为相同的体素产生相似的表示，并为不同的体素产生不同的表示。这导致了统一的多尺度表示，可以捕获全局语义(例如，身体部位)和局部语义(例如，不同的小器官或健康与肿瘤组织)。我们使用vox2vec在6500多张公开可用的计算机断层扫描图像上预训练FPN。我们通过在预训练的表示上附加简单的头部来评估它们，并为22个分割任务训练生成的模型。我们表明，vox2vec在三个评估设置中优于现有的医学成像SSL技术:线性和非线性探测以及端到端微调。此外，在冻结的vox2vec表示之上训练的非线性头部与从头开始训练的FPN相比具有竞争性性能，同时可训练参数减少了50倍。代码可在https://github.com/mishgon/vox2vec上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

vox2vec: A Framework for Self-supervised Contrastive Learning of Voxel-level Representations in Medical Images

This paper introduces vox2vec - a contrastive method for self-supervised learning (SSL) of voxel-level representations. vox2vec representations are modeled by a Feature Pyramid Network (FPN): a voxel representation is a concatenation of the corresponding feature vectors from different pyramid levels. The FPN is pre-trained to produce similar representations for the same voxel in different augmented contexts and distinctive representations for different voxels. This results in unified multi-scale representations that capture both global semantics (e.g., body part) and local semantics (e.g., different small organs or healthy versus tumor tissue). We use vox2vec to pre-train a FPN on more than 6500 publicly available computed tomography images. We evaluate the pre-trained representations by attaching simple heads on top of them and training the resulting models for 22 segmentation tasks. We show that vox2vec outperforms existing medical imaging SSL techniques in three evaluation setups: linear and non-linear probing and end-to-end fine-tuning. Moreover, a non-linear head trained on top of the frozen vox2vec representations achieves competitive performance with the FPN trained from scratch while having 50 times fewer trainable parameters. The code is available at https://github.com/mishgon/vox2vec .

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention

自引率

0.00%

发文量