A Scalable Architecture for CNN Accelerators Leveraging High-Performance Memories

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI:10.1109/HPEC43674.2020.9286162

Maarten Hattink, G. D. Guglielmo, L. Carloni, K. Bergman

引用次数: 1

Abstract

As FPGA-based accelerators become ubiquitous and more powerful, the demand for integration with High-Performance Memory (HPM) grows. Although HPMs offer a much greater bandwidth than standard DDR4 DRAM, they introduce new design challenges such as increased latency and higher bandwidth mismatch between memory and FPGA cores. This paper presents a scalable architecture for convolutional neural network accelerators conceived specifically to address these challenges and make full use of the memory's high bandwidth. The accelerator, which was designed using high-level synthesis, is highly configurable. The intrinsic parallelism of its architecture allows near-perfect scaling up to saturating the available memory bandwidth.

查看原文本刊更多论文

利用高性能存储器的CNN加速器的可扩展架构

随着基于fpga的加速器变得越来越普遍和强大，对与高性能存储器(HPM)集成的需求也在增长。虽然hpm提供比标准DDR4 DRAM更大的带宽，但它们带来了新的设计挑战，例如延迟增加以及内存和FPGA内核之间更高的带宽不匹配。本文提出了一种可扩展的卷积神经网络加速器架构，专门用于解决这些挑战，并充分利用内存的高带宽。加速器采用高水平合成技术设计，具有高度可配置性。其架构固有的并行性允许近乎完美的扩展以饱和可用内存带宽。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量