NEURAghe

ACM Transactions on Reconfigurable Technology and Systems (TRETS) Pub Date : 2018-12-12 DOI:10.1145/3284357

P. Meloni, Alessandro Capotondi, Gianfranco Deriu, Michele Brian, Francesco Conti, D. Rossi, L. Raffo, L. Benini

{"title":"NEURAghe","authors":"P. Meloni, Alessandro Capotondi, Gianfranco Deriu, Michele Brian, Francesco Conti, D. Rossi, L. Raffo, L. Benini","doi":"10.1145/3284357","DOIUrl":null,"url":null,"abstract":"Deep convolutional neural networks (CNNs) obtain outstanding results in tasks that require human-level understanding of data, like image or speech recognition. However, their computational load is significant, motivating the development of CNN-specialized accelerators. This work presents NEURAghe, a flexible and efficient hardware/software solution for the acceleration of CNNs on Zynq SoCs. NEURAghe leverages the synergistic usage of Zynq ARM cores and of a powerful and flexible Convolution-Specific Processor deployed on the reconfigurable logic. The Convolution-Specific Processor embeds both a convolution engine and a programmable soft core, releasing the ARM processors from most of the supervision duties and allowing the accelerator to be controlled by software at an ultra-fine granularity. This methodology opens the way for cooperative heterogeneous computing: While the accelerator takes care of the bulk of the CNN workload, the ARM cores can seamlessly execute hard-to-accelerate parts of the computational graph, taking advantage of the NEON vector engines to further speed up computation. Through the companion NeuDNN SW stack, NEURAghe supports end-to-end CNN-based classification with a peak performance of 169GOps/s, and an energy efficiency of 17GOps/W. Thanks to our heterogeneous computing model, our platform improves upon the state-of-the-art, achieving a frame rate of 5.5 frames per second (fps) on the end-to-end execution of VGG-16 and 6.6fps on ResNet-18.","PeriodicalId":162787,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","volume":"29 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3284357","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 58

Abstract

Deep convolutional neural networks (CNNs) obtain outstanding results in tasks that require human-level understanding of data, like image or speech recognition. However, their computational load is significant, motivating the development of CNN-specialized accelerators. This work presents NEURAghe, a flexible and efficient hardware/software solution for the acceleration of CNNs on Zynq SoCs. NEURAghe leverages the synergistic usage of Zynq ARM cores and of a powerful and flexible Convolution-Specific Processor deployed on the reconfigurable logic. The Convolution-Specific Processor embeds both a convolution engine and a programmable soft core, releasing the ARM processors from most of the supervision duties and allowing the accelerator to be controlled by software at an ultra-fine granularity. This methodology opens the way for cooperative heterogeneous computing: While the accelerator takes care of the bulk of the CNN workload, the ARM cores can seamlessly execute hard-to-accelerate parts of the computational graph, taking advantage of the NEON vector engines to further speed up computation. Through the companion NeuDNN SW stack, NEURAghe supports end-to-end CNN-based classification with a peak performance of 169GOps/s, and an energy efficiency of 17GOps/W. Thanks to our heterogeneous computing model, our platform improves upon the state-of-the-art, achieving a frame rate of 5.5 frames per second (fps) on the end-to-end execution of VGG-16 and 6.6fps on ResNet-18.

查看原文本刊更多论文

NEURAghe

深度卷积神经网络(cnn)在需要人类对数据理解的任务中取得了出色的成绩，比如图像或语音识别。然而，它们的计算负荷很大，这促使了cnn专用加速器的发展。这项工作提出了NEURAghe，一个灵活高效的硬件/软件解决方案，用于加速Zynq soc上的cnn。NEURAghe利用Zynq ARM内核和部署在可重构逻辑上的强大而灵活的卷积特定处理器的协同使用。卷积专用处理器嵌入了卷积引擎和可编程软核，将ARM处理器从大部分监督职责中解脱出来，并允许软件以超细粒度控制加速器。这种方法为协作异构计算开辟了道路:当加速器处理大量CNN工作负载时，ARM内核可以无缝地执行难以加速的计算图部分，利用NEON矢量引擎进一步加快计算速度。通过配合的NeuDNN SW堆栈，NEURAghe支持端到端基于cnn的分类，峰值性能为169GOps/s，能效为17GOps/W。由于我们的异构计算模型，我们的平台在最先进的基础上进行了改进，在VGG-16的端到端执行上实现了每秒5.5帧(fps)的帧率，在ResNet-18上实现了每秒6.6帧的帧率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Reconfigurable Technology and Systems (TRETS)

自引率

0.00%

发文量