基于FPGA的gan转置卷积资源高效加速算法研究

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI:10.1109/ICFPT47387.2019.00011

Xinkai Di, Haigang Yang, Zhihong Huang, Ning Mao, Yiping Jia, Yong Zheng

{"title":"基于FPGA的gan转置卷积资源高效加速算法研究","authors":"Xinkai Di, Haigang Yang, Zhihong Huang, Ning Mao, Yiping Jia, Yong Zheng","doi":"10.1109/ICFPT47387.2019.00011","DOIUrl":null,"url":null,"abstract":"In recent years, Generative Adversarial Networks (GANs) have been widely adopted for computer vision tasks such as generation/synthesis of massive images and 3D object modeling. The hardware acceleration of Transposed Convolution layers is especially essential since the Generative Model (Generator) as a critical component in GANs is computationally intensive in nature. In transposed Convolution, the zeros-inserting preprocessing causes sparsity of the feature maps and further results in many invalid operations. Most of the existing FPGA architectures cannot effectively tackle this issue. To address the challenges of implementing Transposed Convolution on FPGAs, we present an innovative dataflow design approach by applying the Winograd algorithm for fast processing with a high efficiency in terms of resource allocations. In addition, we propose an underlying Hardware Accelerator Architecture that features having PUs embedded in Parallel, Pipelined, and Buffered processing flow. In this paper, a parallelism-aware Memory Partition scheme is also exploded for bandwidth efficient data access. Implementations of several state-of-the-art GANs by our approach achieves an average performance of 639.2 GOPS on Xilinx ZCU102 FPGA device. In reference to an optimized conventional accelerator baseline, this work demonstrates an 8.6× (up to 11.7×) improvement in processing performance, compared to below 2.2× improvement by the other works in literature.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Exploring Resource-Efficient Acceleration Algorithm for Transposed Convolution of GANs on FPGA\",\"authors\":\"Xinkai Di, Haigang Yang, Zhihong Huang, Ning Mao, Yiping Jia, Yong Zheng\",\"doi\":\"10.1109/ICFPT47387.2019.00011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, Generative Adversarial Networks (GANs) have been widely adopted for computer vision tasks such as generation/synthesis of massive images and 3D object modeling. The hardware acceleration of Transposed Convolution layers is especially essential since the Generative Model (Generator) as a critical component in GANs is computationally intensive in nature. In transposed Convolution, the zeros-inserting preprocessing causes sparsity of the feature maps and further results in many invalid operations. Most of the existing FPGA architectures cannot effectively tackle this issue. To address the challenges of implementing Transposed Convolution on FPGAs, we present an innovative dataflow design approach by applying the Winograd algorithm for fast processing with a high efficiency in terms of resource allocations. In addition, we propose an underlying Hardware Accelerator Architecture that features having PUs embedded in Parallel, Pipelined, and Buffered processing flow. In this paper, a parallelism-aware Memory Partition scheme is also exploded for bandwidth efficient data access. Implementations of several state-of-the-art GANs by our approach achieves an average performance of 639.2 GOPS on Xilinx ZCU102 FPGA device. In reference to an optimized conventional accelerator baseline, this work demonstrates an 8.6× (up to 11.7×) improvement in processing performance, compared to below 2.2× improvement by the other works in literature.\",\"PeriodicalId\":241340,\"journal\":{\"name\":\"2019 International Conference on Field-Programmable Technology (ICFPT)\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Field-Programmable Technology (ICFPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFPT47387.2019.00011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT47387.2019.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

近年来，生成对抗网络(Generative Adversarial Networks, GANs)被广泛应用于海量图像的生成/合成和三维物体建模等计算机视觉任务。由于生成模型(Generator)作为gan的关键组成部分，其本质上是计算密集型的，因此转置卷积层的硬件加速尤为重要。在转置卷积中，插入零预处理会导致特征映射稀疏，进而导致许多无效操作。大多数现有的FPGA架构都不能有效地解决这个问题。为了解决在fpga上实现转置卷积的挑战，我们提出了一种创新的数据流设计方法，通过应用Winograd算法进行快速处理，在资源分配方面具有高效率。此外，我们提出了一个底层硬件加速器架构，其特点是将pu嵌入并行，流水线和缓冲处理流中。本文还提出了一种并行感知的内存分区方案，用于带宽高效的数据访问。我们的方法在Xilinx ZCU102 FPGA器件上实现了几个最先进的gan，平均性能达到639.2 GOPS。参考优化的传统加速器基线，这项工作展示了8.6倍(高达11.7倍)的处理性能改进，而文献中其他工作的改进不到2.2倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploring Resource-Efficient Acceleration Algorithm for Transposed Convolution of GANs on FPGA

In recent years, Generative Adversarial Networks (GANs) have been widely adopted for computer vision tasks such as generation/synthesis of massive images and 3D object modeling. The hardware acceleration of Transposed Convolution layers is especially essential since the Generative Model (Generator) as a critical component in GANs is computationally intensive in nature. In transposed Convolution, the zeros-inserting preprocessing causes sparsity of the feature maps and further results in many invalid operations. Most of the existing FPGA architectures cannot effectively tackle this issue. To address the challenges of implementing Transposed Convolution on FPGAs, we present an innovative dataflow design approach by applying the Winograd algorithm for fast processing with a high efficiency in terms of resource allocations. In addition, we propose an underlying Hardware Accelerator Architecture that features having PUs embedded in Parallel, Pipelined, and Buffered processing flow. In this paper, a parallelism-aware Memory Partition scheme is also exploded for bandwidth efficient data access. Implementations of several state-of-the-art GANs by our approach achieves an average performance of 639.2 GOPS on Xilinx ZCU102 FPGA device. In reference to an optimized conventional accelerator baseline, this work demonstrates an 8.6× (up to 11.7×) improvement in processing performance, compared to below 2.2× improvement by the other works in literature.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 International Conference on Field-Programmable Technology (ICFPT)

自引率

0.00%

发文量