基于FPGA的高通量混合精度CNN加速器设计

Cecilia Latotzke, Tim Ciesielski, T. Gemmeke
{"title":"基于FPGA的高通量混合精度CNN加速器设计","authors":"Cecilia Latotzke, Tim Ciesielski, T. Gemmeke","doi":"10.1109/FPL57034.2022.00061","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks (CNNs) reach high accuracies in various application domains, but require large amounts of computation and incur costly data movements. One method to decrease these costs while trading accuracy is weight and/or activation word-length reduction. Thereby, layer-wise mixed-precision quantization allows for more efficient results while inflating the design space. In this work, we present an in-depth quantitative methodology to efficiently explore the design space considering the limited hardware resources of a given FPGA. Our holistic exploration approach vertically traverses the various design entry levels from the architectural down to the logic level, and laterally covers optimization from processing elements to dataflow for an efficient mixed-precision CNN accelerator. Our resulting hardware accelerators implement truly mixed-precision operations that enable efficient execution of layer-wise and channel-wise quantized CNNs. Mapping feed-forward and identity-shortcut-connection mixed-precision CNNs result in competitive accuracy-throughout trade-offs: 245 frames/s with 87.48% Top-5 accuracy for ResNet-18 and 92.9% Top-5 accuracy with 1.13 TOps/s for ResNet-152, respectively. Thereby, the required memory footprint for parameters is reduced by 4.9 × and 9.4 × compared to the respective floating-point baseline.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA\",\"authors\":\"Cecilia Latotzke, Tim Ciesielski, T. Gemmeke\",\"doi\":\"10.1109/FPL57034.2022.00061\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional Neural Networks (CNNs) reach high accuracies in various application domains, but require large amounts of computation and incur costly data movements. One method to decrease these costs while trading accuracy is weight and/or activation word-length reduction. Thereby, layer-wise mixed-precision quantization allows for more efficient results while inflating the design space. In this work, we present an in-depth quantitative methodology to efficiently explore the design space considering the limited hardware resources of a given FPGA. Our holistic exploration approach vertically traverses the various design entry levels from the architectural down to the logic level, and laterally covers optimization from processing elements to dataflow for an efficient mixed-precision CNN accelerator. Our resulting hardware accelerators implement truly mixed-precision operations that enable efficient execution of layer-wise and channel-wise quantized CNNs. Mapping feed-forward and identity-shortcut-connection mixed-precision CNNs result in competitive accuracy-throughout trade-offs: 245 frames/s with 87.48% Top-5 accuracy for ResNet-18 and 92.9% Top-5 accuracy with 1.13 TOps/s for ResNet-152, respectively. Thereby, the required memory footprint for parameters is reduced by 4.9 × and 9.4 × compared to the respective floating-point baseline.\",\"PeriodicalId\":380116,\"journal\":{\"name\":\"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FPL57034.2022.00061\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPL57034.2022.00061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

卷积神经网络(Convolutional Neural Networks, cnn)在许多应用领域都能达到较高的精度,但需要大量的计算量和昂贵的数据移动成本。在交易准确性的同时降低这些成本的一种方法是减少权重和/或激活词长度。因此,分层混合精度量化可以在扩大设计空间的同时获得更有效的结果。在这项工作中,我们提出了一种深入的定量方法,以有效地探索设计空间,考虑到给定FPGA的有限硬件资源。我们的整体探索方法垂直地跨越了从架构到逻辑层次的各种设计入口,横向地涵盖了从处理元素到数据流的优化,以实现高效的混合精度CNN加速器。我们得到的硬件加速器实现了真正的混合精度操作,能够有效地执行分层和信道量化cnn。映射前馈和身份快捷连接混合精度cnn导致了竞争性的准确性权衡:245帧/秒,ResNet-18的前5精度为87.48%,ResNet-152的前5精度为92.9%,1.13 TOps/s。因此,与各自的浮点基准相比,参数所需的内存占用减少了4.9倍和9.4倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA
Convolutional Neural Networks (CNNs) reach high accuracies in various application domains, but require large amounts of computation and incur costly data movements. One method to decrease these costs while trading accuracy is weight and/or activation word-length reduction. Thereby, layer-wise mixed-precision quantization allows for more efficient results while inflating the design space. In this work, we present an in-depth quantitative methodology to efficiently explore the design space considering the limited hardware resources of a given FPGA. Our holistic exploration approach vertically traverses the various design entry levels from the architectural down to the logic level, and laterally covers optimization from processing elements to dataflow for an efficient mixed-precision CNN accelerator. Our resulting hardware accelerators implement truly mixed-precision operations that enable efficient execution of layer-wise and channel-wise quantized CNNs. Mapping feed-forward and identity-shortcut-connection mixed-precision CNNs result in competitive accuracy-throughout trade-offs: 245 frames/s with 87.48% Top-5 accuracy for ResNet-18 and 92.9% Top-5 accuracy with 1.13 TOps/s for ResNet-152, respectively. Thereby, the required memory footprint for parameters is reduced by 4.9 × and 9.4 × compared to the respective floating-point baseline.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信