Token Sparsification for Faster Medical Image Segmentation

Information processing in medical imaging : proceedings of the ... conference Pub Date : 2023-03-11 DOI:10.48550/arXiv.2303.06522

Lei Zhou, Huidong Liu, Joseph Bae, Junjun He, D. Samaras, P. Prasanna

{"title":"Token Sparsification for Faster Medical Image Segmentation","authors":"Lei Zhou, Huidong Liu, Joseph Bae, Junjun He, D. Samaras, P. Prasanna","doi":"10.48550/arXiv.2303.06522","DOIUrl":null,"url":null,"abstract":"Can we use sparse tokens for dense prediction, e.g., segmentation? Although token sparsification has been applied to Vision Transformers (ViT) to accelerate classification, it is still unknown how to perform segmentation from sparse tokens. To this end, we reformulate segmentation as a sparse encoding ->token completion ->dense decoding (SCD) pipeline. We first empirically show that naively applying existing approaches from classification token pruning and masked image modeling (MIM) leads to failure and inefficient training caused by inappropriate sampling algorithms and the low quality of the restored dense features. In this paper, we propose Soft-topK Token Pruning (STP) and Multi-layer Token Assembly (MTA) to address these problems. In sparse encoding, STP predicts token importance scores with a lightweight sub-network and samples the topK tokens. The intractable topK gradients are approximated through a continuous perturbed score distribution. In token completion, MTA restores a full token sequence by assembling both sparse output tokens and pruned multi-layer intermediate ones. The last dense decoding stage is compatible with existing segmentation decoders, e.g., UNETR. Experiments show SCD pipelines equipped with STP and MTA are much faster than baselines without token pruning in both training (up to 120% higher throughput and inference up to 60.6% higher throughput) while maintaining segmentation quality.","PeriodicalId":73379,"journal":{"name":"Information processing in medical imaging : proceedings of the ... conference","volume":"19 1","pages":"743-754"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information processing in medical imaging : proceedings of the ... conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2303.06522","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Can we use sparse tokens for dense prediction, e.g., segmentation? Although token sparsification has been applied to Vision Transformers (ViT) to accelerate classification, it is still unknown how to perform segmentation from sparse tokens. To this end, we reformulate segmentation as a sparse encoding ->token completion ->dense decoding (SCD) pipeline. We first empirically show that naively applying existing approaches from classification token pruning and masked image modeling (MIM) leads to failure and inefficient training caused by inappropriate sampling algorithms and the low quality of the restored dense features. In this paper, we propose Soft-topK Token Pruning (STP) and Multi-layer Token Assembly (MTA) to address these problems. In sparse encoding, STP predicts token importance scores with a lightweight sub-network and samples the topK tokens. The intractable topK gradients are approximated through a continuous perturbed score distribution. In token completion, MTA restores a full token sequence by assembling both sparse output tokens and pruned multi-layer intermediate ones. The last dense decoding stage is compatible with existing segmentation decoders, e.g., UNETR. Experiments show SCD pipelines equipped with STP and MTA are much faster than baselines without token pruning in both training (up to 120% higher throughput and inference up to 60.6% higher throughput) while maintaining segmentation quality.

查看原文本刊更多论文

用于快速医学图像分割的标记稀疏化

我们可以使用稀疏标记进行密集预测，例如分割吗?尽管标记稀疏化已被应用于视觉变形器(Vision transformer, ViT)来加速分类，但如何从稀疏标记中进行分割仍然是一个未知的问题。为此，我们将分割重新制定为稀疏编码->令牌补全->密集解码(SCD)管道。我们首先通过经验证明，天真地应用分类令牌修剪和掩蔽图像建模(MIM)等现有方法会导致采样算法不合适以及恢复的密集特征质量低，从而导致训练失败和低效。在本文中，我们提出软顶令牌修剪(STP)和多层令牌组装(MTA)来解决这些问题。在稀疏编码中，STP使用轻量级子网络预测令牌重要性分数，并对topK令牌进行采样。顽固性topK梯度通过连续扰动分数分布近似。在令牌补全中，MTA通过组合稀疏的输出令牌和经过修剪的多层中间令牌来恢复完整的令牌序列。最后一个密集解码阶段与现有的分割解码器兼容，例如UNETR。实验表明，在保持分割质量的同时，配备STP和MTA的SCD管道在训练(高达120%的吞吐量和高达60.6%的吞吐量)上都比没有令牌修剪的基线快得多。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information processing in medical imaging : proceedings of the ... conference

自引率

0.00%

发文量