{"title":"基于小世界结构剪枝的高效FPGA深度神经网络推理","authors":"Gokul Krishnan, Yufei Ma, Yu Cao","doi":"10.1109/ICSICT49897.2020.9278024","DOIUrl":null,"url":null,"abstract":"DNN pruning approaches usually trim model parameters without exploiting the intrinsic graph properties and hardware preferences. As a result, an FPGA accelerator may not directly benefit from such random pruning, with additional cost on indexing and control modules. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique that integrates local clusters and global sparsity in the Small-World graph and the data locality in the FPGA dataflow. The proposed technique hierarchically trims the DNN into a sparse graph before training, which follows both the Small-World property and FPGA dataflow preferences, such as grouped non-zero and zero parameters to skip data load and corresponding computation. The pruned model is then trained for a given dataset and fine-tuned to achieve the best accuracy. We evaluate the proposed technique for multiple DNNs with different datasets. It achieves state-of-the-art sparsity ratio of up to 76% for CIFAR-10, 84% for CIFAR-100, and 76% for the SVHN datasets. Moreover, the generated sparse DNN achieves up to 4× improvement in throughput for an output stationary FPGA architecture across different DNNs with a marginal hardware overhead.","PeriodicalId":6727,"journal":{"name":"2020 IEEE 15th International Conference on Solid-State & Integrated Circuit Technology (ICSICT)","volume":"198 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks\",\"authors\":\"Gokul Krishnan, Yufei Ma, Yu Cao\",\"doi\":\"10.1109/ICSICT49897.2020.9278024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"DNN pruning approaches usually trim model parameters without exploiting the intrinsic graph properties and hardware preferences. As a result, an FPGA accelerator may not directly benefit from such random pruning, with additional cost on indexing and control modules. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique that integrates local clusters and global sparsity in the Small-World graph and the data locality in the FPGA dataflow. The proposed technique hierarchically trims the DNN into a sparse graph before training, which follows both the Small-World property and FPGA dataflow preferences, such as grouped non-zero and zero parameters to skip data load and corresponding computation. The pruned model is then trained for a given dataset and fine-tuned to achieve the best accuracy. We evaluate the proposed technique for multiple DNNs with different datasets. It achieves state-of-the-art sparsity ratio of up to 76% for CIFAR-10, 84% for CIFAR-100, and 76% for the SVHN datasets. Moreover, the generated sparse DNN achieves up to 4× improvement in throughput for an output stationary FPGA architecture across different DNNs with a marginal hardware overhead.\",\"PeriodicalId\":6727,\"journal\":{\"name\":\"2020 IEEE 15th International Conference on Solid-State & Integrated Circuit Technology (ICSICT)\",\"volume\":\"198 1\",\"pages\":\"1-5\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 15th International Conference on Solid-State & Integrated Circuit Technology (ICSICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSICT49897.2020.9278024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 15th International Conference on Solid-State & Integrated Circuit Technology (ICSICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSICT49897.2020.9278024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks
DNN pruning approaches usually trim model parameters without exploiting the intrinsic graph properties and hardware preferences. As a result, an FPGA accelerator may not directly benefit from such random pruning, with additional cost on indexing and control modules. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique that integrates local clusters and global sparsity in the Small-World graph and the data locality in the FPGA dataflow. The proposed technique hierarchically trims the DNN into a sparse graph before training, which follows both the Small-World property and FPGA dataflow preferences, such as grouped non-zero and zero parameters to skip data load and corresponding computation. The pruned model is then trained for a given dataset and fine-tuned to achieve the best accuracy. We evaluate the proposed technique for multiple DNNs with different datasets. It achieves state-of-the-art sparsity ratio of up to 76% for CIFAR-10, 84% for CIFAR-100, and 76% for the SVHN datasets. Moreover, the generated sparse DNN achieves up to 4× improvement in throughput for an output stationary FPGA architecture across different DNNs with a marginal hardware overhead.