一种高性能的基于fpga的二元权密度网混合流水线结构

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI:10.1109/HPEC43674.2020.9286185

Shihao Zeng, Yihua Huang

{"title":"一种高性能的基于fpga的二元权密度网混合流水线结构","authors":"Shihao Zeng, Yihua Huang","doi":"10.1109/HPEC43674.2020.9286185","DOIUrl":null,"url":null,"abstract":"The DenseNet achieves remarkable performance in various computer vision tasks with much fewer parameters and operations. However, there are few acceleration designs about the DenseNet, due to its dense-connectivity structure. In this paper, we apply the binary weight method on the DenseNet and then propose a hybrid-pipelined architecture for FPGA-based acceleration of the binary weight DenseNet, which can be stored entirely in a chip. To deal with the dense-connectivity, a reusable convolution unit is developed to support conv1×1 and conv3×3 efficiently. Moreover, a theoretical method of system parallelism is proposed to guide the top-level pipelined design for the maximum efficiency. To evaluate the proposed architecture, the binary weight DenseNet-100 model is trained on CIFAR10 dataset and then implemented on VX690T FPGA, at the cost of 4.18% accuracy loss. The experiment demonstrates that our architecture can achieve the throughput of 514 GOPS and 889 FPS at 200MHz, and the performance-efficiency is up to 62.4%, which outperforms the most related works.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Hybrid-Pipelined Architecture for FPGA-based Binary Weight DenseNet with High Performance-Efficiency\",\"authors\":\"Shihao Zeng, Yihua Huang\",\"doi\":\"10.1109/HPEC43674.2020.9286185\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The DenseNet achieves remarkable performance in various computer vision tasks with much fewer parameters and operations. However, there are few acceleration designs about the DenseNet, due to its dense-connectivity structure. In this paper, we apply the binary weight method on the DenseNet and then propose a hybrid-pipelined architecture for FPGA-based acceleration of the binary weight DenseNet, which can be stored entirely in a chip. To deal with the dense-connectivity, a reusable convolution unit is developed to support conv1×1 and conv3×3 efficiently. Moreover, a theoretical method of system parallelism is proposed to guide the top-level pipelined design for the maximum efficiency. To evaluate the proposed architecture, the binary weight DenseNet-100 model is trained on CIFAR10 dataset and then implemented on VX690T FPGA, at the cost of 4.18% accuracy loss. The experiment demonstrates that our architecture can achieve the throughput of 514 GOPS and 889 FPS at 200MHz, and the performance-efficiency is up to 62.4%, which outperforms the most related works.\",\"PeriodicalId\":168544,\"journal\":{\"name\":\"2020 IEEE High Performance Extreme Computing Conference (HPEC)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE High Performance Extreme Computing Conference (HPEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPEC43674.2020.9286185\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC43674.2020.9286185","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

DenseNet在各种计算机视觉任务中以更少的参数和操作实现了卓越的性能。然而，由于其密集的连接结构，关于DenseNet的加速设计很少。本文将二元权值法应用于DenseNet，提出了一种基于fpga的二元权值DenseNet的混合流水线结构，该结构可以完全存储在一个芯片中。为了处理密集连通性，开发了一个可重用的卷积单元来有效地支持conv1×1和conv3×3。在此基础上，提出了一种系统并行性的理论方法来指导顶层流水线设计，以实现效率最大化。为了评估所提出的体系结构，在CIFAR10数据集上训练二元权值DenseNet-100模型，然后在VX690T FPGA上实现，代价是精度损失4.18%。实验表明，我们的架构在200MHz下可以实现514 GOPS和889 FPS的吞吐量，性能效率高达62.4%，优于大多数相关工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Hybrid-Pipelined Architecture for FPGA-based Binary Weight DenseNet with High Performance-Efficiency

The DenseNet achieves remarkable performance in various computer vision tasks with much fewer parameters and operations. However, there are few acceleration designs about the DenseNet, due to its dense-connectivity structure. In this paper, we apply the binary weight method on the DenseNet and then propose a hybrid-pipelined architecture for FPGA-based acceleration of the binary weight DenseNet, which can be stored entirely in a chip. To deal with the dense-connectivity, a reusable convolution unit is developed to support conv1×1 and conv3×3 efficiently. Moreover, a theoretical method of system parallelism is proposed to guide the top-level pipelined design for the maximum efficiency. To evaluate the proposed architecture, the binary weight DenseNet-100 model is trained on CIFAR10 dataset and then implemented on VX690T FPGA, at the cost of 4.18% accuracy loss. The experiment demonstrates that our architecture can achieve the throughput of 514 GOPS and 889 FPS at 200MHz, and the performance-efficiency is up to 62.4%, which outperforms the most related works.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量