DiSparse: Disentangled Sparsification for Multitask Model Compression

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2022-06-01 DOI:10.1109/CVPR52688.2022.01206

Xing Sun, Ali Hassani, Zhangyang Wang, Gao Huang, Humphrey Shi

{"title":"DiSparse: Disentangled Sparsification for Multitask Model Compression","authors":"Xing Sun, Ali Hassani, Zhangyang Wang, Gao Huang, Humphrey Shi","doi":"10.1109/CVPR52688.2022.01206","DOIUrl":null,"url":null,"abstract":"Despite the popularity of Model Compression and Mul-titask Learning, how to effectively compress a multitask model has been less thoroughly analyzed due to the chal-lenging entanglement of tasks in the parameter space. In this paper, we propose DiSparse, a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme. We consider each task independently by disentangling the importance measurement and take the unani-mous decisions among all tasks when performing parame-ter pruning and selection. Our experimental results demon-strate superior performance on various configurations and settings compared to popular sparse training and pruning methods. Besides the effectiveness in compression, DiS-parse also provides a powerful tool to the multitask learning community. Surprisingly, we even observed better per-formance than some dedicated multitask learning methods in several cases despite the high model sparsity enforced by DiSparse. We analyzed the pruning masks generated with DiSparse and observed strikingly similar sparse net-work architecture identified by each task even before the training starts. We also observe the existence of a “water-shed” layer where the task relatedness sharply drops, implying no benefits in continued parameters sharing. Our code and models will be available at: https://github.com/SHI-Labs/DiSparse-Multitask-Model-Compression.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR52688.2022.01206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Despite the popularity of Model Compression and Mul-titask Learning, how to effectively compress a multitask model has been less thoroughly analyzed due to the chal-lenging entanglement of tasks in the parameter space. In this paper, we propose DiSparse, a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme. We consider each task independently by disentangling the importance measurement and take the unani-mous decisions among all tasks when performing parame-ter pruning and selection. Our experimental results demon-strate superior performance on various configurations and settings compared to popular sparse training and pruning methods. Besides the effectiveness in compression, DiS-parse also provides a powerful tool to the multitask learning community. Surprisingly, we even observed better per-formance than some dedicated multitask learning methods in several cases despite the high model sparsity enforced by DiSparse. We analyzed the pruning masks generated with DiSparse and observed strikingly similar sparse net-work architecture identified by each task even before the training starts. We also observe the existence of a “water-shed” layer where the task relatedness sharply drops, implying no benefits in continued parameters sharing. Our code and models will be available at: https://github.com/SHI-Labs/DiSparse-Multitask-Model-Compression.

查看原文本刊更多论文

多任务模型压缩的解纠缠稀疏化

尽管模型压缩和多任务学习很受欢迎，但由于任务在参数空间中具有挑战性的纠缠，如何有效地压缩多任务模型一直没有得到深入的分析。在本文中，我们提出了一种简单、有效、首创的多任务剪枝和稀疏训练方案。我们通过解纠缠的重要性度量来独立考虑每个任务，并在进行参数修剪和选择时对所有任务进行一致决策。我们的实验结果表明，与流行的稀疏训练和修剪方法相比，该方法在各种配置和设置上具有优越的性能。除了在压缩方面的有效性，DiS-parse还为多任务学习社区提供了一个强大的工具。令人惊讶的是，在一些情况下，我们甚至观察到比一些专用的多任务学习方法更好的性能，尽管由DiSparse强制实现了高模型稀疏性。我们分析了由DiSparse生成的修剪掩模，并观察到甚至在训练开始之前，每个任务识别出的稀疏网络架构非常相似。我们还观察到“分水岭”层的存在，其中任务相关性急剧下降，这意味着继续共享参数没有好处。我们的代码和模型可在:https://github.com/SHI-Labs/DiSparse-Multitask-Model-Compression。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量