CQNN: a CGRA-based QNN Framework

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI:10.1109/HPEC43674.2020.9286194

Tong Geng, Chunshu Wu, Cheng Tan, B. Fang, Ang Li, Martin C. Herbordt

{"title":"CQNN: a CGRA-based QNN Framework","authors":"Tong Geng, Chunshu Wu, Cheng Tan, B. Fang, Ang Li, Martin C. Herbordt","doi":"10.1109/HPEC43674.2020.9286194","DOIUrl":null,"url":null,"abstract":"Quantized Neural Networks (QNNs) have drawn tremendous attention since - when compared with Convolution Neural Networks (CNNs) - they often dramatically reduce computation, communication, and storage demands with negligible loss in accuracy. To find an optimal balance between performance and accuracy, developers use different data-widths for different layers and channels. Given this large parameter space, it is challenging to design a QNN accelerator which is generally efficient for various and flexible model configurations. In this paper we propose CQNN, a novel Coarse-Grained Reconfigurable Architecture-based (CGRA) QNN acceleration framework. CQNN has a large number of basic components for binary functions. By programming CQNN at runtime according to the target QNN model, these basic components are integrated to support QNN functions with any data-width and hyperparameter requirements. The result is an optimal QNN for the target model. The framework includes compiler, hardware design, simulator, and RTL generator. Experimental results show CQNNs can complete the inference of AlexNet and VGG-16 within 0.13ms and 2.63ms, respectively. We demonstrate the design on an FPGA platform; however, this is only for showcasing the method: the approach does not rely on any FPGA-specific features and can thus be implemented as ASIC as well.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC43674.2020.9286194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Quantized Neural Networks (QNNs) have drawn tremendous attention since - when compared with Convolution Neural Networks (CNNs) - they often dramatically reduce computation, communication, and storage demands with negligible loss in accuracy. To find an optimal balance between performance and accuracy, developers use different data-widths for different layers and channels. Given this large parameter space, it is challenging to design a QNN accelerator which is generally efficient for various and flexible model configurations. In this paper we propose CQNN, a novel Coarse-Grained Reconfigurable Architecture-based (CGRA) QNN acceleration framework. CQNN has a large number of basic components for binary functions. By programming CQNN at runtime according to the target QNN model, these basic components are integrated to support QNN functions with any data-width and hyperparameter requirements. The result is an optimal QNN for the target model. The framework includes compiler, hardware design, simulator, and RTL generator. Experimental results show CQNNs can complete the inference of AlexNet and VGG-16 within 0.13ms and 2.63ms, respectively. We demonstrate the design on an FPGA platform; however, this is only for showcasing the method: the approach does not rely on any FPGA-specific features and can thus be implemented as ASIC as well.

查看原文本刊更多论文

CQNN:基于cgra的QNN框架

与卷积神经网络(cnn)相比，量化神经网络(QNNs)经常显著减少计算、通信和存储需求，而精度损失可以忽略不计，因此引起了极大的关注。为了在性能和准确性之间找到最佳平衡，开发人员为不同的层和通道使用不同的数据宽度。在如此大的参数空间下，设计一种对各种灵活的模型配置都普遍有效的QNN加速器是一项挑战。本文提出了一种新的基于粗粒度可重构体系结构(CGRA)的QNN加速框架CQNN。CQNN具有大量二元函数的基本分量。通过在运行时根据目标QNN模型对CQNN进行编程，将这些基本组件集成在一起，以支持任意数据宽度和超参数要求的QNN函数。结果是目标模型的最优QNN。该框架包括编译器、硬件设计、模拟器和RTL生成器。实验结果表明，cqnn分别在0.13ms和2.63ms内完成对AlexNet和VGG-16的推理。我们在FPGA平台上演示了该设计;然而，这只是为了展示该方法:该方法不依赖于任何fpga特定的功能，因此也可以作为ASIC实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量