Parallelism-flexible Convolution Core for Sparse Convolutional Neural Networks on FPGA

Q4 Engineering

IPSJ Transactions on System LSI Design Methodology Pub Date : 2019-01-01 DOI:10.2197/ipsjtsldm.12.22

Salita Sombatsiri, S. Shibata, Yuki Kobayashi, Hiroaki Inoue, Takashi Takenaka, T. Hosomi, Jaehoon Yu, Yoshinori Takeuchi

引用次数: 4

Abstract

This paper proposes a convolution core for sparse CNN that is capable of flexibly alternating the parallelism schemes and degree exploiting intraand inter-output parallelism of the convolutional layer, and leveraging weight sparsity using a compressed sparse model in the compressed sparse column format and output-stationary dataflow. The experimental results show that the performance is improved by 3.9 times even in the deeper layer where the conventional accelerator could not fully exploit the parallelism due to the small layer size. The proposed architecture could also exploit the weight sparsity. Then, by combining both the multi-parallelism and the weight sparsity, the proposed architecture achieved 5.2 times better performance than the conventional accelerator.

查看原文本刊更多论文

基于FPGA的稀疏卷积神经网络并行柔性卷积核

本文提出了一种稀疏CNN的卷积核，该卷积核能够灵活地交替并行方案和度，利用卷积层的输出内并行性和输出间并行性，并在压缩稀疏列格式和输出平稳数据流中使用压缩稀疏模型利用权稀疏性。实验结果表明，在传统加速器由于层数小而无法充分发挥并行性的情况下，即使在较深的层中，性能也提高了3.9倍。所提出的体系结构还可以利用权重稀疏性。然后，结合多重并行性和权值稀疏性，该架构的性能比传统加速器提高了5.2倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IPSJ Transactions on System LSI Design Methodology Engineering-Electrical and Electronic Engineering

CiteScore

1.20

自引率

0.00%

发文量