A Survey on Deep Neural Network Pruning: Taxonomy, Comparison, Analysis, and Recommendations

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-08-21 DOI:10.1109/TPAMI.2024.3447085

Hongrong Cheng;Miao Zhang;Javen Qinfeng Shi

{"title":"A Survey on Deep Neural Network Pruning: Taxonomy, Comparison, Analysis, and Recommendations","authors":"Hongrong Cheng;Miao Zhang;Javen Qinfeng Shi","doi":"10.1109/TPAMI.2024.3447085","DOIUrl":null,"url":null,"abstract":"Modern deep neural networks, particularly recent large language models, come with massive model sizes that require significant computational and storage resources. To enable the deployment of modern models on resource-constrained environments and to accelerate inference time, researchers have increasingly explored pruning techniques as a popular research direction in neural network compression. More than three thousand pruning papers have been published from 2020 to 2024. However, there is a dearth of up-to-date comprehensive review papers on pruning. To address this issue, in this survey, we provide a comprehensive review of existing research works on deep neural network pruning in a taxonomy of 1) universal/specific speedup, 2) when to prune, 3) how to prune, and 4) fusion of pruning and other compression techniques. We then provide a thorough comparative analysis of eight pairs of contrast settings for pruning (e.g., unstructured/structured, one-shot/iterative, data-free/data-driven, initialized/pre-trained weights, etc.) and explore several emerging topics, including pruning for large language models, vision transformers, diffusion models, and large multimodal models, post-training pruning, and different levels of supervision for pruning to shed light on the commonalities and differences of existing methods and lay the foundation for further method development. Finally, we provide some valuable recommendations on selecting pruning methods and prospect several promising research directions for neural network pruning. To facilitate future research on deep neural network pruning, we summarize broad pruning applications (e.g., adversarial robustness, natural language understanding, etc.) and build a curated collection of datasets, networks, and evaluations on different applications. We maintain a repository on \n<uri>https://github.com/hrcheng1066/awesome-pruning</uri>\n that serves as a comprehensive resource for neural network pruning papers and corresponding open-source codes. We will keep updating this repository to include the latest advancements in the field.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"10558-10578"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10643325/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Modern deep neural networks, particularly recent large language models, come with massive model sizes that require significant computational and storage resources. To enable the deployment of modern models on resource-constrained environments and to accelerate inference time, researchers have increasingly explored pruning techniques as a popular research direction in neural network compression. More than three thousand pruning papers have been published from 2020 to 2024. However, there is a dearth of up-to-date comprehensive review papers on pruning. To address this issue, in this survey, we provide a comprehensive review of existing research works on deep neural network pruning in a taxonomy of 1) universal/specific speedup, 2) when to prune, 3) how to prune, and 4) fusion of pruning and other compression techniques. We then provide a thorough comparative analysis of eight pairs of contrast settings for pruning (e.g., unstructured/structured, one-shot/iterative, data-free/data-driven, initialized/pre-trained weights, etc.) and explore several emerging topics, including pruning for large language models, vision transformers, diffusion models, and large multimodal models, post-training pruning, and different levels of supervision for pruning to shed light on the commonalities and differences of existing methods and lay the foundation for further method development. Finally, we provide some valuable recommendations on selecting pruning methods and prospect several promising research directions for neural network pruning. To facilitate future research on deep neural network pruning, we summarize broad pruning applications (e.g., adversarial robustness, natural language understanding, etc.) and build a curated collection of datasets, networks, and evaluations on different applications. We maintain a repository on https://github.com/hrcheng1066/awesome-pruning that serves as a comprehensive resource for neural network pruning papers and corresponding open-source codes. We will keep updating this repository to include the latest advancements in the field.

查看原文本刊更多论文

深度神经网络剪枝调查：分类、比较、分析和建议。

现代深度神经网络，尤其是最新的大型语言模型，具有庞大的模型规模，需要大量的计算和存储资源。为了在资源受限的环境中部署现代模型并加快推理时间，研究人员越来越多地探索剪枝技术，将其作为神经网络压缩的一个热门研究方向。从 2020 年到 2024 年，已经发表了三千多篇剪枝论文。然而，关于剪枝的最新综合综述论文却十分匮乏。为解决这一问题，我们在本调查报告中全面回顾了现有的深度神经网络剪枝研究工作，并从以下几个方面进行了分类：1）通用/特定加速；2）何时剪枝；3）如何剪枝；4）剪枝与其他压缩技术的融合。然后，我们对剪枝的八对对比设置（如非结构化/结构化、单次/迭代、无数据/数据驱动、初始化/预训练权重等）进行了深入的对比分析，并探讨了几个新出现的主题，包括大型语言模型、视觉变换器、扩散模型和大型多模态模型的剪枝、后训练剪枝以及剪枝的不同监督水平，以揭示现有方法的共性和差异，为进一步的方法开发奠定基础。最后，我们就剪枝方法的选择提出了一些有价值的建议，并展望了神经网络剪枝的几个有前景的研究方向。为了促进未来对深度神经网络剪枝的研究，我们总结了剪枝的广泛应用（如对抗鲁棒性、自然语言理解等），并建立了一个数据集、网络和不同应用评估的集合。我们在 https://github.com/hrcheng1066/awesome-pruning 上维护了一个资源库，作为神经网络剪枝论文和相应开源代码的综合资源。我们将不断更新该资源库，以纳入该领域的最新进展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量