SWIFT:基于小世界的结构剪枝在FPGA上加速DNN推理

Yufei Ma, Gokul Krishnan, Yu Cao, Le Ye, Ru Huang
{"title":"SWIFT:基于小世界的结构剪枝在FPGA上加速DNN推理","authors":"Yufei Ma, Gokul Krishnan, Yu Cao, Le Ye, Ru Huang","doi":"10.1145/3431920.3439465","DOIUrl":null,"url":null,"abstract":"State-of-the-art DNN pruning approaches achieved high sparsity. However, these methods usually do not consider the intrinsic graph property of DNNs, leading to an irregular pruned network. Consequently, hardware accelerators cannot directly benefit from such pruning, suffering additional cost on indexing, control and data paths. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique, SWIFT, that integrates local clusters and global sparsity in DNNs to benefit the dataflow and workload balance of the accelerators. In particular, we propose an output stationary FPGA architecture to accelerate DNN inference and integrate it with the structural sparsity by SWIFT, so that the communication and computation of clustered zero weights are eliminated. In addition, a full mesh data router is designed to adaptively direct inputs into corresponding processing elements (PEs) for different layer configurations and skipping zero operations. The proposed SWIFT is evaluated with multiple DNNs on different datasets. It achieves sparsity ratio up to 76% for CIFAR-10, 83% for CIFAR-100, 76% for the SVHN datasets. Moreover, our proposed SWIFT FPGA accelerator achieves up to 4.4× improvement in throughput for different dense networks with a marginal hardware overhead.","PeriodicalId":386071,"journal":{"name":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"31 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"SWIFT: Small-World-based Structural Pruning to Accelerate DNN Inference on FPGA\",\"authors\":\"Yufei Ma, Gokul Krishnan, Yu Cao, Le Ye, Ru Huang\",\"doi\":\"10.1145/3431920.3439465\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"State-of-the-art DNN pruning approaches achieved high sparsity. However, these methods usually do not consider the intrinsic graph property of DNNs, leading to an irregular pruned network. Consequently, hardware accelerators cannot directly benefit from such pruning, suffering additional cost on indexing, control and data paths. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique, SWIFT, that integrates local clusters and global sparsity in DNNs to benefit the dataflow and workload balance of the accelerators. In particular, we propose an output stationary FPGA architecture to accelerate DNN inference and integrate it with the structural sparsity by SWIFT, so that the communication and computation of clustered zero weights are eliminated. In addition, a full mesh data router is designed to adaptively direct inputs into corresponding processing elements (PEs) for different layer configurations and skipping zero operations. The proposed SWIFT is evaluated with multiple DNNs on different datasets. It achieves sparsity ratio up to 76% for CIFAR-10, 83% for CIFAR-100, 76% for the SVHN datasets. Moreover, our proposed SWIFT FPGA accelerator achieves up to 4.4× improvement in throughput for different dense networks with a marginal hardware overhead.\",\"PeriodicalId\":386071,\"journal\":{\"name\":\"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"volume\":\"31 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3431920.3439465\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3431920.3439465","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

最先进的DNN修剪方法实现了高稀疏性。然而,这些方法通常没有考虑dnn的内在图性质,导致不规则修剪网络。因此,硬件加速器不能直接从这种修剪中获益,而是在索引、控制和数据路径上承受额外的成本。受观察到大脑和现实世界网络遵循小世界模型的启发,我们提出了一种基于图的渐进式结构修剪技术SWIFT,该技术将dnn中的局部集群和全局稀疏性集成在一起,从而有利于加速器的数据流和工作负载平衡。特别地,我们提出了一种输出固定的FPGA架构来加速DNN推理,并通过SWIFT将其与结构稀疏性相结合,从而消除了聚类零权的通信和计算。此外,设计了一个全网格数据路由器,可以自适应地将输入直接输入到不同层配置的相应处理元素(pe)中,并跳过零操作。在不同的数据集上使用多个dnn对所提出的SWIFT进行了评估。它实现了CIFAR-10的76%,CIFAR-100的83%,SVHN数据集的76%的稀疏比。此外,我们提出的SWIFT FPGA加速器在不同的密集网络中实现了高达4.4倍的吞吐量提升,并且硬件开销很小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SWIFT: Small-World-based Structural Pruning to Accelerate DNN Inference on FPGA
State-of-the-art DNN pruning approaches achieved high sparsity. However, these methods usually do not consider the intrinsic graph property of DNNs, leading to an irregular pruned network. Consequently, hardware accelerators cannot directly benefit from such pruning, suffering additional cost on indexing, control and data paths. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique, SWIFT, that integrates local clusters and global sparsity in DNNs to benefit the dataflow and workload balance of the accelerators. In particular, we propose an output stationary FPGA architecture to accelerate DNN inference and integrate it with the structural sparsity by SWIFT, so that the communication and computation of clustered zero weights are eliminated. In addition, a full mesh data router is designed to adaptively direct inputs into corresponding processing elements (PEs) for different layer configurations and skipping zero operations. The proposed SWIFT is evaluated with multiple DNNs on different datasets. It achieves sparsity ratio up to 76% for CIFAR-10, 83% for CIFAR-100, 76% for the SVHN datasets. Moreover, our proposed SWIFT FPGA accelerator achieves up to 4.4× improvement in throughput for different dense networks with a marginal hardware overhead.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信