CQNN: a CGRA-based QNN Framework

Tong Geng, Chunshu Wu, Cheng Tan, B. Fang, Ang Li, Martin C. Herbordt
{"title":"CQNN: a CGRA-based QNN Framework","authors":"Tong Geng, Chunshu Wu, Cheng Tan, B. Fang, Ang Li, Martin C. Herbordt","doi":"10.1109/HPEC43674.2020.9286194","DOIUrl":null,"url":null,"abstract":"Quantized Neural Networks (QNNs) have drawn tremendous attention since - when compared with Convolution Neural Networks (CNNs) - they often dramatically reduce computation, communication, and storage demands with negligible loss in accuracy. To find an optimal balance between performance and accuracy, developers use different data-widths for different layers and channels. Given this large parameter space, it is challenging to design a QNN accelerator which is generally efficient for various and flexible model configurations. In this paper we propose CQNN, a novel Coarse-Grained Reconfigurable Architecture-based (CGRA) QNN acceleration framework. CQNN has a large number of basic components for binary functions. By programming CQNN at runtime according to the target QNN model, these basic components are integrated to support QNN functions with any data-width and hyperparameter requirements. The result is an optimal QNN for the target model. The framework includes compiler, hardware design, simulator, and RTL generator. Experimental results show CQNNs can complete the inference of AlexNet and VGG-16 within 0.13ms and 2.63ms, respectively. We demonstrate the design on an FPGA platform; however, this is only for showcasing the method: the approach does not rely on any FPGA-specific features and can thus be implemented as ASIC as well.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC43674.2020.9286194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Quantized Neural Networks (QNNs) have drawn tremendous attention since - when compared with Convolution Neural Networks (CNNs) - they often dramatically reduce computation, communication, and storage demands with negligible loss in accuracy. To find an optimal balance between performance and accuracy, developers use different data-widths for different layers and channels. Given this large parameter space, it is challenging to design a QNN accelerator which is generally efficient for various and flexible model configurations. In this paper we propose CQNN, a novel Coarse-Grained Reconfigurable Architecture-based (CGRA) QNN acceleration framework. CQNN has a large number of basic components for binary functions. By programming CQNN at runtime according to the target QNN model, these basic components are integrated to support QNN functions with any data-width and hyperparameter requirements. The result is an optimal QNN for the target model. The framework includes compiler, hardware design, simulator, and RTL generator. Experimental results show CQNNs can complete the inference of AlexNet and VGG-16 within 0.13ms and 2.63ms, respectively. We demonstrate the design on an FPGA platform; however, this is only for showcasing the method: the approach does not rely on any FPGA-specific features and can thus be implemented as ASIC as well.
CQNN:基于cgra的QNN框架
与卷积神经网络(cnn)相比,量化神经网络(QNNs)经常显著减少计算、通信和存储需求,而精度损失可以忽略不计,因此引起了极大的关注。为了在性能和准确性之间找到最佳平衡,开发人员为不同的层和通道使用不同的数据宽度。在如此大的参数空间下,设计一种对各种灵活的模型配置都普遍有效的QNN加速器是一项挑战。本文提出了一种新的基于粗粒度可重构体系结构(CGRA)的QNN加速框架CQNN。CQNN具有大量二元函数的基本分量。通过在运行时根据目标QNN模型对CQNN进行编程,将这些基本组件集成在一起,以支持任意数据宽度和超参数要求的QNN函数。结果是目标模型的最优QNN。该框架包括编译器、硬件设计、模拟器和RTL生成器。实验结果表明,cqnn分别在0.13ms和2.63ms内完成对AlexNet和VGG-16的推理。我们在FPGA平台上演示了该设计;然而,这只是为了展示该方法:该方法不依赖于任何fpga特定的功能,因此也可以作为ASIC实现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信