fpga上高效推理的低精度网络

R. Abra, Dmitry Denisenko, Richard Allen, Tim Vanderhoek, Sarah Wolstencroft, Peter M. Gibson
{"title":"fpga上高效推理的低精度网络","authors":"R. Abra, Dmitry Denisenko, Richard Allen, Tim Vanderhoek, Sarah Wolstencroft, Peter M. Gibson","doi":"10.1109/ICFPT52863.2021.9609837","DOIUrl":null,"url":null,"abstract":"Block Floating Point (BFP) is a type of quantization that combines high dynamic range with low-cost inference. BFP can be implemented efficiently on FPGA hardware and, at low precision, halves the logic footprint versus blocked FP16 while maintaining accuracy. Moving to very low precision halves the logic footprint again and retraining allows the recovery of any accuracy lost in transition. This paper describes our approach to achieving target accuracy and FPGA resource usage in a low-precision end-to-end AI solution. We go on to investigate the effects of retraining with our software model that replicates the low-level implementation of BFP on FPGA. Our solution allows efficacy testing for the quantization of custom networks and provides accuracy indications and resource usage for the final application. Using our solution, we were able to quantize ResNet 50, SSD300 and UNet to int5/4bfp precision without losing accuracy while reducing FPGA resources and improving performance.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Low Precision Networks for Efficient Inference on FPGAs\",\"authors\":\"R. Abra, Dmitry Denisenko, Richard Allen, Tim Vanderhoek, Sarah Wolstencroft, Peter M. Gibson\",\"doi\":\"10.1109/ICFPT52863.2021.9609837\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Block Floating Point (BFP) is a type of quantization that combines high dynamic range with low-cost inference. BFP can be implemented efficiently on FPGA hardware and, at low precision, halves the logic footprint versus blocked FP16 while maintaining accuracy. Moving to very low precision halves the logic footprint again and retraining allows the recovery of any accuracy lost in transition. This paper describes our approach to achieving target accuracy and FPGA resource usage in a low-precision end-to-end AI solution. We go on to investigate the effects of retraining with our software model that replicates the low-level implementation of BFP on FPGA. Our solution allows efficacy testing for the quantization of custom networks and provides accuracy indications and resource usage for the final application. Using our solution, we were able to quantize ResNet 50, SSD300 and UNet to int5/4bfp precision without losing accuracy while reducing FPGA resources and improving performance.\",\"PeriodicalId\":376220,\"journal\":{\"name\":\"2021 International Conference on Field-Programmable Technology (ICFPT)\",\"volume\":\"100 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Field-Programmable Technology (ICFPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFPT52863.2021.9609837\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT52863.2021.9609837","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

块浮点(BFP)是一种结合了高动态范围和低成本推理的量化方法。BFP可以在FPGA硬件上高效实现,并且在低精度下,与阻塞的FP16相比,在保持精度的同时减少了一半的逻辑占用。移动到非常低的精度可以再次减少一半的逻辑占用,并且重新训练可以恢复转换中丢失的任何精度。本文描述了我们在低精度端到端人工智能解决方案中实现目标精度和FPGA资源使用的方法。我们继续用我们的软件模型来研究再训练的效果,该模型在FPGA上复制了BFP的低级实现。我们的解决方案允许对自定义网络的量化进行功效测试,并为最终应用提供准确性指示和资源使用情况。使用我们的解决方案,我们能够将ResNet 50, SSD300和UNet量化到int5/4bfp精度,同时减少FPGA资源并提高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Low Precision Networks for Efficient Inference on FPGAs
Block Floating Point (BFP) is a type of quantization that combines high dynamic range with low-cost inference. BFP can be implemented efficiently on FPGA hardware and, at low precision, halves the logic footprint versus blocked FP16 while maintaining accuracy. Moving to very low precision halves the logic footprint again and retraining allows the recovery of any accuracy lost in transition. This paper describes our approach to achieving target accuracy and FPGA resource usage in a low-precision end-to-end AI solution. We go on to investigate the effects of retraining with our software model that replicates the low-level implementation of BFP on FPGA. Our solution allows efficacy testing for the quantization of custom networks and provides accuracy indications and resource usage for the final application. Using our solution, we were able to quantize ResNet 50, SSD300 and UNet to int5/4bfp precision without losing accuracy while reducing FPGA resources and improving performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信