一体化:具有动态电源管理的边缘设备高度代表性的DNN剪枝框架

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2022-10-29 DOI:10.1145/3508352.3549379

Yifan Gong, Zheng Zhan, Pu Zhao, Yushu Wu, Chaoan Wu, Caiwen Ding, Weiwen Jiang, Minghai Qin, Yanzhi Wang

{"title":"一体化:具有动态电源管理的边缘设备高度代表性的DNN剪枝框架","authors":"Yifan Gong, Zheng Zhan, Pu Zhao, Yushu Wu, Chaoan Wu, Caiwen Ding, Weiwen Jiang, Minghai Qin, Yanzhi Wang","doi":"10.1145/3508352.3549379","DOIUrl":null,"url":null,"abstract":"During the deployment of deep neural networks (DNNs) on edge devices, many research efforts are devoted to the limited hardware resource. However, little attention is paid to the influence of dynamic power management. As edge devices typically only have a budget of energy with batteries (rather than almost unlimited energy support on servers or workstations), their dynamic power management often changes the execution frequency as in the widely-used dynamic voltage and frequency scaling (DVFS) technique. This leads to highly unstable inference speed performance, especially for computation-intensive DNN models, which can harm user experience and waste hardware resources. We firstly identify this problem and then propose All-in-One, a highly representative pruning framework to work with dynamic power management using DVFS. The framework can use only one set of model weights and soft masks (together with other auxiliary parameters of negligible storage) to represent multiple models of various pruning ratios. By re-configuring the model to the corresponding pruning ratio for a specific execution frequency (and voltage), we are able to achieve stable inference speed, i.e., keeping the difference in speed performance under various execution frequencies as small as possible. Our experiments demonstrate that our method not only achieves high accuracy for multiple models of different pruning ratios, but also reduces their variance of inference latency for various frequencies, with minimal memory consumption of only one model and one soft mask.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management\",\"authors\":\"Yifan Gong, Zheng Zhan, Pu Zhao, Yushu Wu, Chaoan Wu, Caiwen Ding, Weiwen Jiang, Minghai Qin, Yanzhi Wang\",\"doi\":\"10.1145/3508352.3549379\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"During the deployment of deep neural networks (DNNs) on edge devices, many research efforts are devoted to the limited hardware resource. However, little attention is paid to the influence of dynamic power management. As edge devices typically only have a budget of energy with batteries (rather than almost unlimited energy support on servers or workstations), their dynamic power management often changes the execution frequency as in the widely-used dynamic voltage and frequency scaling (DVFS) technique. This leads to highly unstable inference speed performance, especially for computation-intensive DNN models, which can harm user experience and waste hardware resources. We firstly identify this problem and then propose All-in-One, a highly representative pruning framework to work with dynamic power management using DVFS. The framework can use only one set of model weights and soft masks (together with other auxiliary parameters of negligible storage) to represent multiple models of various pruning ratios. By re-configuring the model to the corresponding pruning ratio for a specific execution frequency (and voltage), we are able to achieve stable inference speed, i.e., keeping the difference in speed performance under various execution frequencies as small as possible. Our experiments demonstrate that our method not only achieves high accuracy for multiple models of different pruning ratios, but also reduces their variance of inference latency for various frequencies, with minimal memory consumption of only one model and one soft mask.\",\"PeriodicalId\":270592,\"journal\":{\"name\":\"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3508352.3549379\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508352.3549379","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在边缘设备上部署深度神经网络(dnn)时，许多研究工作都致力于有限的硬件资源。然而，人们对动态电源管理的影响却很少关注。由于边缘设备通常只有电池的能量预算(而不是服务器或工作站上几乎无限的能量支持)，它们的动态电源管理经常改变执行频率，如广泛使用的动态电压和频率缩放(DVFS)技术。这导致推理速度性能非常不稳定，特别是对于计算密集型的DNN模型，这可能会损害用户体验并浪费硬件资源。我们首先确定了这个问题，然后提出了All-in-One，这是一个非常有代表性的修剪框架，用于使用DVFS进行动态电源管理。该框架只能使用一组模型权值和软掩模(连同其他可忽略存储的辅助参数)来表示不同剪枝比的多个模型。通过将模型重新配置为特定执行频率(和电压)对应的剪枝比，我们可以获得稳定的推理速度，即在各种执行频率下保持速度性能的差异尽可能小。实验结果表明，该方法不仅在不同剪枝比的多个模型上获得了较高的准确率，而且在不同频率下减少了它们的推理延迟方差，并且仅消耗一个模型和一个软掩模的最小内存。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management

During the deployment of deep neural networks (DNNs) on edge devices, many research efforts are devoted to the limited hardware resource. However, little attention is paid to the influence of dynamic power management. As edge devices typically only have a budget of energy with batteries (rather than almost unlimited energy support on servers or workstations), their dynamic power management often changes the execution frequency as in the widely-used dynamic voltage and frequency scaling (DVFS) technique. This leads to highly unstable inference speed performance, especially for computation-intensive DNN models, which can harm user experience and waste hardware resources. We firstly identify this problem and then propose All-in-One, a highly representative pruning framework to work with dynamic power management using DVFS. The framework can use only one set of model weights and soft masks (together with other auxiliary parameters of negligible storage) to represent multiple models of various pruning ratios. By re-configuring the model to the corresponding pruning ratio for a specific execution frequency (and voltage), we are able to achieve stable inference speed, i.e., keeping the difference in speed performance under various execution frequencies as small as possible. Our experiments demonstrate that our method not only achieves high accuracy for multiple models of different pruning ratios, but also reduces their variance of inference latency for various frequencies, with minimal memory consumption of only one model and one soft mask.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

自引率

0.00%

发文量