Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

ACM Transactions on Design Automation of Electronic Systems (TODAES) Pub Date : 2021-11-22 DOI:10.1145/3495532

Yifan Gong, Geng Yuan, Zheng Zhan, Wei Niu, Zhengang Li, Pu Zhao, Yuxuan Cai, Sijia Liu, Bin Ren, Xue Lin, Xulong Tang, Yanzhi Wang

{"title":"Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration","authors":"Yifan Gong, Geng Yuan, Zheng Zhan, Wei Niu, Zhengang Li, Pu Zhao, Yuxuan Cai, Sijia Liu, Bin Ren, Xue Lin, Xulong Tang, Yanzhi Wang","doi":"10.1145/3495532","DOIUrl":null,"url":null,"abstract":"Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this article, we propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations that are applicable to any type of DNN layer while achieving high accuracy and hardware inference performance. With the flexibility of applying different pruning schemes to different layers enabled by our compiler optimizations, we further probe into the new problem of determining the best-suited pruning scheme considering the different acceleration and accuracy performance of various pruning schemes. Two pruning scheme mapping methods—one -search based and the other is rule based—are proposed to automatically derive the best-suited pruning regularity and block size for each layer of any given DNN. Experimental results demonstrate that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework with up to 2.48 \\( \\times \\) and 1.73 \\( \\times \\) DNN inference acceleration on CIFAR-10 and ImageNet datasets without accuracy loss.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"5 1","pages":"1 - 26"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3495532","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this article, we propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations that are applicable to any type of DNN layer while achieving high accuracy and hardware inference performance. With the flexibility of applying different pruning schemes to different layers enabled by our compiler optimizations, we further probe into the new problem of determining the best-suited pruning scheme considering the different acceleration and accuracy performance of various pruning schemes. Two pruning scheme mapping methods—one -search based and the other is rule based—are proposed to automatically derive the best-suited pruning regularity and block size for each layer of any given DNN. Experimental results demonstrate that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework with up to 2.48 \( \times \) and 1.73 \( \times \) DNN inference acceleration on CIFAR-10 and ImageNet datasets without accuracy loss.

查看原文本刊更多论文

实时移动加速中最适合DNN修剪方案的自动映射

权值修剪是一种有效的模型压缩技术，可以解决在移动设备上实现实时深度神经网络(DNN)推理的挑战。然而，由于精度降低、难以利用硬件加速和/或对某些类型的DNN层的限制，先前的修剪方案的应用场景有限。在本文中，我们提出了一种通用的、细粒度的结构化修剪方案和相应的编译器优化，该方案适用于任何类型的深度神经网络层，同时实现高精度和硬件推理性能。通过编译器优化，我们可以灵活地在不同的层上应用不同的剪枝方案，我们进一步探讨了考虑到不同剪枝方案的不同加速和精度性能，确定最适合的剪枝方案的新问题。提出了两种基于搜索和基于规则的剪枝方案映射方法，可自动导出任意给定深度神经网络每层最适合的剪枝规则和块大小。实验结果表明，我们的修剪方案映射方法与一般的细粒度结构化修剪方案一起，在CIFAR-10和ImageNet数据集上优于最先进的DNN优化框架，DNN推理加速高达2.48 \( \times \)和1.73 \( \times \)，而精度没有损失。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Design Automation of Electronic Systems (TODAES)

自引率

0.00%

发文量