Deep neural networks compiler for a trace-based accelerator (short WIP paper)

Proceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems Pub Date : 2018-06-19 DOI:10.1145/3211332.3211333

Andre Xian Ming Chang, Aliasger Zaidy, L. Burzawa, E. Culurciello

{"title":"Deep neural networks compiler for a trace-based accelerator (short WIP paper)","authors":"Andre Xian Ming Chang, Aliasger Zaidy, L. Burzawa, E. Culurciello","doi":"10.1145/3211332.3211333","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks (DNNs) are the algorithm of choice for image processing applications. DNNs present highly parallel workloads that lead to the emergence of custom hardware accelerators. Deep Learning (DL) models specialized in different tasks require a programmable custom hardware and a compiler/mapper to efficiently translate different DNNs into an efficient dataflow in the accelerator. The goal of this paper is to present a compiler for running DNNs on Snowflake, which is a programmable hardware accelerator that targets DNNs. The compiler correctly generates instructions for various DL models: AlexNet, VGG, ResNet and LightCNN9. Snowflake, with a varying number of processing units, was implemented on FPGA to measure the compiler and Snowflake performance properties upon scaling up. The system achieves 70 frames/s and 4.5 GB/s of off-chip memory bandwidth for AlexNet without linear layers on Xilinx’s Zynq-SoC XC7Z045 FPGA.","PeriodicalId":258348,"journal":{"name":"Proceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3211332.3211333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Deep Neural Networks (DNNs) are the algorithm of choice for image processing applications. DNNs present highly parallel workloads that lead to the emergence of custom hardware accelerators. Deep Learning (DL) models specialized in different tasks require a programmable custom hardware and a compiler/mapper to efficiently translate different DNNs into an efficient dataflow in the accelerator. The goal of this paper is to present a compiler for running DNNs on Snowflake, which is a programmable hardware accelerator that targets DNNs. The compiler correctly generates instructions for various DL models: AlexNet, VGG, ResNet and LightCNN9. Snowflake, with a varying number of processing units, was implemented on FPGA to measure the compiler and Snowflake performance properties upon scaling up. The system achieves 70 frames/s and 4.5 GB/s of off-chip memory bandwidth for AlexNet without linear layers on Xilinx’s Zynq-SoC XC7Z045 FPGA.

查看原文本刊更多论文

基于轨迹加速器的深度神经网络编译器(短WIP论文)

深度神经网络(dnn)是图像处理应用的首选算法。dnn提供高度并行的工作负载，导致定制硬件加速器的出现。专门用于不同任务的深度学习(DL)模型需要可编程的定制硬件和编译器/映射器来有效地将不同的dnn转换为加速器中的有效数据流。本文的目标是提出一个在Snowflake上运行dnn的编译器，这是一个针对dnn的可编程硬件加速器。编译器正确地为各种DL模型生成指令:AlexNet, VGG, ResNet和LightCNN9。雪花具有不同数量的处理单元，在FPGA上实现，以测量编译器和雪花在扩展时的性能特性。该系统在Xilinx的Zynq-SoC XC7Z045 FPGA上实现了70帧/秒和4.5 GB/秒的片外内存带宽。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems

自引率

0.00%

发文量