{"title":"具有灵活DSE策略的流水线CNN加速器高性能部署框架","authors":"Conghui Luo, Wen-Liang Huang, Dehao Xiang, Yihua Huang","doi":"10.1109/HPEC55821.2022.9926377","DOIUrl":null,"url":null,"abstract":"The pipelined DCNN(Deep Convolutional Neural Networks) accelerator can effectively take advantage of the inter-layer parallelism, so it is widely used, e.g., video stream processing. But the large amount of intermediate results generated in the pipelined accelerator imposes a considerable burden on the on-chip storage resources on FPGAs. To ease the overburden storage demand, a storage-optimized design space exploration (DSE) method is proposed at the cost of a slight drop of computing resource utilization ratio. The experimental results show that the DSE strategy can achieve 98.49% and 98.00% CE (Computation Engines) utilization ratio on VGG16 and ResNet101, respectively. In addition, the resource optimization strategy can save 27.84% of BRAM resources on VGG 16, while the CE utilization ratio dropped by only 3.04%. An automated deployment framework that is adaptable to different networks with high computing resource utilization ratio is also proposed in this paper, which can achieve workload balancing automatically by optimizing the computing resource allocation of each layer.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A High-performance Deployment Framework for Pipelined CNN Accelerators with Flexible DSE Strategy\",\"authors\":\"Conghui Luo, Wen-Liang Huang, Dehao Xiang, Yihua Huang\",\"doi\":\"10.1109/HPEC55821.2022.9926377\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The pipelined DCNN(Deep Convolutional Neural Networks) accelerator can effectively take advantage of the inter-layer parallelism, so it is widely used, e.g., video stream processing. But the large amount of intermediate results generated in the pipelined accelerator imposes a considerable burden on the on-chip storage resources on FPGAs. To ease the overburden storage demand, a storage-optimized design space exploration (DSE) method is proposed at the cost of a slight drop of computing resource utilization ratio. The experimental results show that the DSE strategy can achieve 98.49% and 98.00% CE (Computation Engines) utilization ratio on VGG16 and ResNet101, respectively. In addition, the resource optimization strategy can save 27.84% of BRAM resources on VGG 16, while the CE utilization ratio dropped by only 3.04%. An automated deployment framework that is adaptable to different networks with high computing resource utilization ratio is also proposed in this paper, which can achieve workload balancing automatically by optimizing the computing resource allocation of each layer.\",\"PeriodicalId\":200071,\"journal\":{\"name\":\"2022 IEEE High Performance Extreme Computing Conference (HPEC)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE High Performance Extreme Computing Conference (HPEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPEC55821.2022.9926377\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC55821.2022.9926377","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A High-performance Deployment Framework for Pipelined CNN Accelerators with Flexible DSE Strategy
The pipelined DCNN(Deep Convolutional Neural Networks) accelerator can effectively take advantage of the inter-layer parallelism, so it is widely used, e.g., video stream processing. But the large amount of intermediate results generated in the pipelined accelerator imposes a considerable burden on the on-chip storage resources on FPGAs. To ease the overburden storage demand, a storage-optimized design space exploration (DSE) method is proposed at the cost of a slight drop of computing resource utilization ratio. The experimental results show that the DSE strategy can achieve 98.49% and 98.00% CE (Computation Engines) utilization ratio on VGG16 and ResNet101, respectively. In addition, the resource optimization strategy can save 27.84% of BRAM resources on VGG 16, while the CE utilization ratio dropped by only 3.04%. An automated deployment framework that is adaptable to different networks with high computing resource utilization ratio is also proposed in this paper, which can achieve workload balancing automatically by optimizing the computing resource allocation of each layer.