神经网络中的结构化和非结构化稀疏性

Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-05-08 DOI:10.1145/3578356.3592583

Christoph Schulte, Sven Wagner, Armin Runge, Dimitrios Bariamis, B. Hammer

{"title":"神经网络中的结构化和非结构化稀疏性","authors":"Christoph Schulte, Sven Wagner, Armin Runge, Dimitrios Bariamis, B. Hammer","doi":"10.1145/3578356.3592583","DOIUrl":null,"url":null,"abstract":"Besides quantization, pruning has shown to be one of the most effective methods to reduce the inference time and required energy of Deep Neural Networks (DNNs). In this work, we propose a sparsity definition that reflects the number of saved operations by pruned parameters to guide the pruning process in order to save as many operations as possible. Based on this, we show the importance of the baseline model's size and quantify the overhead of unstructured sparsity for a commercial-of-the-shelf AI Hardware Accelerator (HWA) in terms of latency reductions. Furthermore, we show that a combination of both structured and unstructured sparsity can mitigate this effect.","PeriodicalId":370204,"journal":{"name":"Proceedings of the 3rd Workshop on Machine Learning and Systems","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Best of both, Structured and Unstructured Sparsity in Neural Networks\",\"authors\":\"Christoph Schulte, Sven Wagner, Armin Runge, Dimitrios Bariamis, B. Hammer\",\"doi\":\"10.1145/3578356.3592583\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Besides quantization, pruning has shown to be one of the most effective methods to reduce the inference time and required energy of Deep Neural Networks (DNNs). In this work, we propose a sparsity definition that reflects the number of saved operations by pruned parameters to guide the pruning process in order to save as many operations as possible. Based on this, we show the importance of the baseline model's size and quantify the overhead of unstructured sparsity for a commercial-of-the-shelf AI Hardware Accelerator (HWA) in terms of latency reductions. Furthermore, we show that a combination of both structured and unstructured sparsity can mitigate this effect.\",\"PeriodicalId\":370204,\"journal\":{\"name\":\"Proceedings of the 3rd Workshop on Machine Learning and Systems\",\"volume\":\"103 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd Workshop on Machine Learning and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3578356.3592583\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd Workshop on Machine Learning and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3578356.3592583","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

除量化外，剪枝是减少深度神经网络推理时间和所需能量的最有效方法之一。在这项工作中，我们提出了一个稀疏度定义，该定义反映了通过修剪参数节省的操作数量，以指导修剪过程，以节省尽可能多的操作。在此基础上，我们展示了基线模型大小的重要性，并量化了商用货架AI硬件加速器(HWA)在延迟减少方面的非结构化稀疏性开销。此外，我们表明结构化和非结构化稀疏性的结合可以减轻这种影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Best of both, Structured and Unstructured Sparsity in Neural Networks

Besides quantization, pruning has shown to be one of the most effective methods to reduce the inference time and required energy of Deep Neural Networks (DNNs). In this work, we propose a sparsity definition that reflects the number of saved operations by pruned parameters to guide the pruning process in order to save as many operations as possible. Based on this, we show the importance of the baseline model's size and quantify the overhead of unstructured sparsity for a commercial-of-the-shelf AI Hardware Accelerator (HWA) in terms of latency reductions. Furthermore, we show that a combination of both structured and unstructured sparsity can mitigate this effect.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 3rd Workshop on Machine Learning and Systems

自引率

0.00%

发文量