Performance characterization and optimization of pruning patterns for sparse DNN inference

BenchCouncil Transactions on Benchmarks, Standards and Evaluations Pub Date : 2022-10-01 DOI:10.1016/j.tbench.2023.100090

Yunjie Liu, Jingwei Sun, Jiaqiang Liu, Guangzhong Sun

{"title":"Performance characterization and optimization of pruning patterns for sparse DNN inference","authors":"Yunjie Liu, Jingwei Sun, Jiaqiang Liu, Guangzhong Sun","doi":"10.1016/j.tbench.2023.100090","DOIUrl":null,"url":null,"abstract":"<div><p>Deep neural networks are suffering from over parameterized high storage and high consumption problems. Pruning can effectively reduce storage and computation costs of deep neural networks by eliminating their redundant parameters. In existing pruning methods, filter pruning achieves more efficient inference, while element-wise pruning maintains better accuracy. To make a trade-off between the two endpoints, a variety of pruning patterns has been proposed. This study analyzes the performance characteristics of sparse DNNs pruned by different patterns, including element-wise, vector-wise, block-wise, and group-wise. Based on the analysis, we propose an efficient implementation of group-wise sparse DNN inference, which can make better use of GPUs. Experimental results on VGG, ResNet, BERT and ViT show that our optimized group-wise pruning pattern achieves much lower inference latency on GPU than other sparse patterns and the existing group-wise pattern implementation.</p></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"2 4","pages":"Article 100090"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772485923000078/pdfft?md5=47f436d7570515bb39cfffeda4376c89&pid=1-s2.0-S2772485923000078-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772485923000078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep neural networks are suffering from over parameterized high storage and high consumption problems. Pruning can effectively reduce storage and computation costs of deep neural networks by eliminating their redundant parameters. In existing pruning methods, filter pruning achieves more efficient inference, while element-wise pruning maintains better accuracy. To make a trade-off between the two endpoints, a variety of pruning patterns has been proposed. This study analyzes the performance characteristics of sparse DNNs pruned by different patterns, including element-wise, vector-wise, block-wise, and group-wise. Based on the analysis, we propose an efficient implementation of group-wise sparse DNN inference, which can make better use of GPUs. Experimental results on VGG, ResNet, BERT and ViT show that our optimized group-wise pruning pattern achieves much lower inference latency on GPU than other sparse patterns and the existing group-wise pattern implementation.

查看原文本刊更多论文

稀疏DNN推理中剪枝模式的性能表征与优化

深度神经网络存在过参数化的高存储和高消耗问题。剪枝通过消除冗余参数，有效地降低了深度神经网络的存储和计算成本。在现有的剪枝方法中，过滤器剪枝可以实现更高效的推理，而元素剪枝可以保持更好的准确性。为了在两个端点之间进行权衡，提出了各种修剪模式。本研究分析了不同模式下的稀疏dnn的性能特征，包括元素型、矢量型、块型和组型。在此基础上，我们提出了一种有效的分组稀疏DNN推理实现方法，可以更好地利用gpu。在VGG、ResNet、BERT和ViT上的实验结果表明，优化后的组明智修剪模式在GPU上的推理延迟比其他稀疏模式和现有的组明智模式实现要低得多。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BenchCouncil Transactions on Benchmarks, Standards and Evaluations

CiteScore

4.80

自引率

0.00%

发文量