具有灵活DSE策略的流水线CNN加速器高性能部署框架

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI:10.1109/HPEC55821.2022.9926377

Conghui Luo, Wen-Liang Huang, Dehao Xiang, Yihua Huang

{"title":"具有灵活DSE策略的流水线CNN加速器高性能部署框架","authors":"Conghui Luo, Wen-Liang Huang, Dehao Xiang, Yihua Huang","doi":"10.1109/HPEC55821.2022.9926377","DOIUrl":null,"url":null,"abstract":"The pipelined DCNN(Deep Convolutional Neural Networks) accelerator can effectively take advantage of the inter-layer parallelism, so it is widely used, e.g., video stream processing. But the large amount of intermediate results generated in the pipelined accelerator imposes a considerable burden on the on-chip storage resources on FPGAs. To ease the overburden storage demand, a storage-optimized design space exploration (DSE) method is proposed at the cost of a slight drop of computing resource utilization ratio. The experimental results show that the DSE strategy can achieve 98.49% and 98.00% CE (Computation Engines) utilization ratio on VGG16 and ResNet101, respectively. In addition, the resource optimization strategy can save 27.84% of BRAM resources on VGG 16, while the CE utilization ratio dropped by only 3.04%. An automated deployment framework that is adaptable to different networks with high computing resource utilization ratio is also proposed in this paper, which can achieve workload balancing automatically by optimizing the computing resource allocation of each layer.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A High-performance Deployment Framework for Pipelined CNN Accelerators with Flexible DSE Strategy\",\"authors\":\"Conghui Luo, Wen-Liang Huang, Dehao Xiang, Yihua Huang\",\"doi\":\"10.1109/HPEC55821.2022.9926377\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The pipelined DCNN(Deep Convolutional Neural Networks) accelerator can effectively take advantage of the inter-layer parallelism, so it is widely used, e.g., video stream processing. But the large amount of intermediate results generated in the pipelined accelerator imposes a considerable burden on the on-chip storage resources on FPGAs. To ease the overburden storage demand, a storage-optimized design space exploration (DSE) method is proposed at the cost of a slight drop of computing resource utilization ratio. The experimental results show that the DSE strategy can achieve 98.49% and 98.00% CE (Computation Engines) utilization ratio on VGG16 and ResNet101, respectively. In addition, the resource optimization strategy can save 27.84% of BRAM resources on VGG 16, while the CE utilization ratio dropped by only 3.04%. An automated deployment framework that is adaptable to different networks with high computing resource utilization ratio is also proposed in this paper, which can achieve workload balancing automatically by optimizing the computing resource allocation of each layer.\",\"PeriodicalId\":200071,\"journal\":{\"name\":\"2022 IEEE High Performance Extreme Computing Conference (HPEC)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE High Performance Extreme Computing Conference (HPEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPEC55821.2022.9926377\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC55821.2022.9926377","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

流水线DCNN(Deep Convolutional Neural Networks)加速器可以有效地利用层间并行性，因此在视频流处理等领域得到了广泛的应用。但是流水线加速器中产生的大量中间结果给fpga的片上存储资源带来了相当大的负担。为了缓解超载存储需求，提出了一种存储优化设计空间探索(DSE)方法，其代价是计算资源利用率略有下降。实验结果表明，DSE策略在VGG16和ResNet101上分别可以达到98.49%和98.00%的CE(计算引擎)利用率。此外，资源优化策略可以节省vgg16上27.84%的BRAM资源，而CE利用率仅下降3.04%。本文还提出了一种适用于计算资源利用率较高的不同网络的自动化部署框架，通过优化各层计算资源的分配，实现工作负载的自动均衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A High-performance Deployment Framework for Pipelined CNN Accelerators with Flexible DSE Strategy

The pipelined DCNN(Deep Convolutional Neural Networks) accelerator can effectively take advantage of the inter-layer parallelism, so it is widely used, e.g., video stream processing. But the large amount of intermediate results generated in the pipelined accelerator imposes a considerable burden on the on-chip storage resources on FPGAs. To ease the overburden storage demand, a storage-optimized design space exploration (DSE) method is proposed at the cost of a slight drop of computing resource utilization ratio. The experimental results show that the DSE strategy can achieve 98.49% and 98.00% CE (Computation Engines) utilization ratio on VGG16 and ResNet101, respectively. In addition, the resource optimization strategy can save 27.84% of BRAM resources on VGG 16, while the CE utilization ratio dropped by only 3.04%. An automated deployment framework that is adaptable to different networks with high computing resource utilization ratio is also proposed in this paper, which can achieve workload balancing automatically by optimizing the computing resource allocation of each layer.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量