稀疏DNN推理中剪枝模式的性能表征与优化

BenchCouncil Transactions on Benchmarks, Standards and Evaluations Pub Date : 2022-10-01 DOI:10.1016/j.tbench.2023.100090

Yunjie Liu, Jingwei Sun, Jiaqiang Liu, Guangzhong Sun

{"title":"稀疏DNN推理中剪枝模式的性能表征与优化","authors":"Yunjie Liu, Jingwei Sun, Jiaqiang Liu, Guangzhong Sun","doi":"10.1016/j.tbench.2023.100090","DOIUrl":null,"url":null,"abstract":"<div><p>Deep neural networks are suffering from over parameterized high storage and high consumption problems. Pruning can effectively reduce storage and computation costs of deep neural networks by eliminating their redundant parameters. In existing pruning methods, filter pruning achieves more efficient inference, while element-wise pruning maintains better accuracy. To make a trade-off between the two endpoints, a variety of pruning patterns has been proposed. This study analyzes the performance characteristics of sparse DNNs pruned by different patterns, including element-wise, vector-wise, block-wise, and group-wise. Based on the analysis, we propose an efficient implementation of group-wise sparse DNN inference, which can make better use of GPUs. Experimental results on VGG, ResNet, BERT and ViT show that our optimized group-wise pruning pattern achieves much lower inference latency on GPU than other sparse patterns and the existing group-wise pattern implementation.</p></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"2 4","pages":"Article 100090"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772485923000078/pdfft?md5=47f436d7570515bb39cfffeda4376c89&pid=1-s2.0-S2772485923000078-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Performance characterization and optimization of pruning patterns for sparse DNN inference\",\"authors\":\"Yunjie Liu, Jingwei Sun, Jiaqiang Liu, Guangzhong Sun\",\"doi\":\"10.1016/j.tbench.2023.100090\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Deep neural networks are suffering from over parameterized high storage and high consumption problems. Pruning can effectively reduce storage and computation costs of deep neural networks by eliminating their redundant parameters. In existing pruning methods, filter pruning achieves more efficient inference, while element-wise pruning maintains better accuracy. To make a trade-off between the two endpoints, a variety of pruning patterns has been proposed. This study analyzes the performance characteristics of sparse DNNs pruned by different patterns, including element-wise, vector-wise, block-wise, and group-wise. Based on the analysis, we propose an efficient implementation of group-wise sparse DNN inference, which can make better use of GPUs. Experimental results on VGG, ResNet, BERT and ViT show that our optimized group-wise pruning pattern achieves much lower inference latency on GPU than other sparse patterns and the existing group-wise pattern implementation.</p></div>\",\"PeriodicalId\":100155,\"journal\":{\"name\":\"BenchCouncil Transactions on Benchmarks, Standards and Evaluations\",\"volume\":\"2 4\",\"pages\":\"Article 100090\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2772485923000078/pdfft?md5=47f436d7570515bb39cfffeda4376c89&pid=1-s2.0-S2772485923000078-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BenchCouncil Transactions on Benchmarks, Standards and Evaluations\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772485923000078\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772485923000078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

深度神经网络存在过参数化的高存储和高消耗问题。剪枝通过消除冗余参数，有效地降低了深度神经网络的存储和计算成本。在现有的剪枝方法中，过滤器剪枝可以实现更高效的推理，而元素剪枝可以保持更好的准确性。为了在两个端点之间进行权衡，提出了各种修剪模式。本研究分析了不同模式下的稀疏dnn的性能特征，包括元素型、矢量型、块型和组型。在此基础上，我们提出了一种有效的分组稀疏DNN推理实现方法，可以更好地利用gpu。在VGG、ResNet、BERT和ViT上的实验结果表明，优化后的组明智修剪模式在GPU上的推理延迟比其他稀疏模式和现有的组明智模式实现要低得多。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Performance characterization and optimization of pruning patterns for sparse DNN inference

Deep neural networks are suffering from over parameterized high storage and high consumption problems. Pruning can effectively reduce storage and computation costs of deep neural networks by eliminating their redundant parameters. In existing pruning methods, filter pruning achieves more efficient inference, while element-wise pruning maintains better accuracy. To make a trade-off between the two endpoints, a variety of pruning patterns has been proposed. This study analyzes the performance characteristics of sparse DNNs pruned by different patterns, including element-wise, vector-wise, block-wise, and group-wise. Based on the analysis, we propose an efficient implementation of group-wise sparse DNN inference, which can make better use of GPUs. Experimental results on VGG, ResNet, BERT and ViT show that our optimized group-wise pruning pattern achieves much lower inference latency on GPU than other sparse patterns and the existing group-wise pattern implementation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BenchCouncil Transactions on Benchmarks, Standards and Evaluations

CiteScore

4.80

自引率

0.00%

发文量