令牌稀疏化实现更快的医学图像分割

Information processing in medical imaging : proceedings of the ... conference Pub Date : 2023-06-01 Epub Date: 2023-06-08 DOI:10.1007/978-3-031-34048-2_57

Lei Zhou, Huidong Liu, Joseph Bae, Junjun He, Dimitris Samaras, Prateek Prasanna

{"title":"令牌稀疏化实现更快的医学图像分割","authors":"Lei Zhou, Huidong Liu, Joseph Bae, Junjun He, Dimitris Samaras, Prateek Prasanna","doi":"10.1007/978-3-031-34048-2_57","DOIUrl":null,"url":null,"abstract":"Can we use sparse tokens for dense prediction, e.g., segmentation? Although token sparsification has been applied to Vision Transformers (ViT) to accelerate classification, it is still unknown how to perform segmentation from sparse tokens. To this end, we reformulate segmentation as a sparse encoding → token completion → dense decoding (SCD) pipeline. We first empirically show that naïvely applying existing approaches from classification token pruning and masked image modeling (MIM) leads to failure and inefficient training caused by inappropriate sampling algorithms and the low quality of the restored dense features. In this paper, we propose Soft-topK Token Pruning (STP) and Multi-layer Token Assembly (MTA) to address these problems. In sparse encoding, STP predicts token importance scores with a lightweight sub-network and samples the topK tokens. The intractable topK gradients are approximated through a continuous perturbed score distribution. In token completion, MTA restores a full token sequence by assembling both sparse output tokens and pruned multi-layer intermediate ones. The last dense decoding stage is compatible with existing segmentation decoders, e.g., UNETR. Experiments show SCD pipelines equipped with STP and MTA are much faster than baselines without token pruning in both training (up to 120% higher throughput) and inference (up to 60.6% higher throughput) while maintaining segmentation quality. Code is available here: https://github.com/cvlab-stonybrook/TokenSparse-for-MedSeg.","PeriodicalId":73379,"journal":{"name":"Information processing in medical imaging : proceedings of the ... conference","volume":"13939 ","pages":"743-754"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11056020/pdf/","citationCount":"0","resultStr":"{\"title\":\"Token Sparsification for Faster Medical Image Segmentation.\",\"authors\":\"Lei Zhou, Huidong Liu, Joseph Bae, Junjun He, Dimitris Samaras, Prateek Prasanna\",\"doi\":\"10.1007/978-3-031-34048-2_57\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Can we use sparse tokens for dense prediction, e.g., segmentation? Although token sparsification has been applied to Vision Transformers (ViT) to accelerate classification, it is still unknown how to perform segmentation from sparse tokens. To this end, we reformulate segmentation as a sparse encoding → token completion → dense decoding (SCD) pipeline. We first empirically show that naïvely applying existing approaches from classification token pruning and masked image modeling (MIM) leads to failure and inefficient training caused by inappropriate sampling algorithms and the low quality of the restored dense features. In this paper, we propose Soft-topK Token Pruning (STP) and Multi-layer Token Assembly (MTA) to address these problems. In sparse encoding, STP predicts token importance scores with a lightweight sub-network and samples the topK tokens. The intractable topK gradients are approximated through a continuous perturbed score distribution. In token completion, MTA restores a full token sequence by assembling both sparse output tokens and pruned multi-layer intermediate ones. The last dense decoding stage is compatible with existing segmentation decoders, e.g., UNETR. Experiments show SCD pipelines equipped with STP and MTA are much faster than baselines without token pruning in both training (up to 120% higher throughput) and inference (up to 60.6% higher throughput) while maintaining segmentation quality. Code is available here: https://github.com/cvlab-stonybrook/TokenSparse-for-MedSeg.\",\"PeriodicalId\":73379,\"journal\":{\"name\":\"Information processing in medical imaging : proceedings of the ... conference\",\"volume\":\"13939 \",\"pages\":\"743-754\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11056020/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information processing in medical imaging : proceedings of the ... conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/978-3-031-34048-2_57\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/6/8 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information processing in medical imaging : proceedings of the ... conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/978-3-031-34048-2_57","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/6/8 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们能否使用稀疏标记进行密集预测，例如分割？虽然标记稀疏化已被应用于视觉转换器（ViT）以加速分类，但如何利用稀疏标记进行分割仍是未知数。为此，我们将分割重新表述为稀疏编码 → 标记补全 → 密集解码（SCD）流水线。我们首先通过经验证明，天真地应用现有的分类标记修剪和掩蔽图像建模（MIM）方法会导致训练失败和效率低下，原因是采样算法不当和恢复的密集特征质量不高。本文提出了软顶层令牌剪枝（Soft-topK Token Pruning，STP）和多层令牌组装（Multi-layer Token Assembly，MTA）来解决这些问题。在稀疏编码中，STP 通过轻量级子网络预测标记重要性得分，并对 topK 标记进行采样。通过连续的扰动分数分布来近似难以处理的 topK 梯度。在标记完成过程中，MTA 通过组装稀疏输出标记和剪枝多层中间标记来恢复完整的标记序列。最后的密集解码阶段与现有的分段解码器（如 UNETR）兼容。实验表明，配备 STP 和 MTA 的 SCD 管道在训练（吞吐量最多提高 120%）和推理（吞吐量最多提高 60.6%）方面都比没有标记剪枝的基线快得多，同时还能保持分割质量。代码见：https://github.com/cvlab-stonybrook/TokenSparse-for-MedSeg。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Token Sparsification for Faster Medical Image Segmentation.

Can we use sparse tokens for dense prediction, e.g., segmentation? Although token sparsification has been applied to Vision Transformers (ViT) to accelerate classification, it is still unknown how to perform segmentation from sparse tokens. To this end, we reformulate segmentation as a sparse encoding → token completion → dense decoding (SCD) pipeline. We first empirically show that naïvely applying existing approaches from classification token pruning and masked image modeling (MIM) leads to failure and inefficient training caused by inappropriate sampling algorithms and the low quality of the restored dense features. In this paper, we propose Soft-topK Token Pruning (STP) and Multi-layer Token Assembly (MTA) to address these problems. In sparse encoding, STP predicts token importance scores with a lightweight sub-network and samples the topK tokens. The intractable topK gradients are approximated through a continuous perturbed score distribution. In token completion, MTA restores a full token sequence by assembling both sparse output tokens and pruned multi-layer intermediate ones. The last dense decoding stage is compatible with existing segmentation decoders, e.g., UNETR. Experiments show SCD pipelines equipped with STP and MTA are much faster than baselines without token pruning in both training (up to 120% higher throughput) and inference (up to 60.6% higher throughput) while maintaining segmentation quality. Code is available here: https://github.com/cvlab-stonybrook/TokenSparse-for-MedSeg.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information processing in medical imaging : proceedings of the ... conference

自引率

0.00%

发文量