不衡量就无法提高

ACM Transactions on Reconfigurable Technology and Systems (TRETS) Pub Date : 2018-12-12 DOI:10.1145/3242898

Andrew Boutros, S. Yazdanshenas, Vaughn Betz

{"title":"不衡量就无法提高","authors":"Andrew Boutros, S. Yazdanshenas, Vaughn Betz","doi":"10.1145/3242898","DOIUrl":null,"url":null,"abstract":"Recently, deep learning (DL) has become best-in-class for numerous applications but at a high computational cost that necessitates high-performance energy-efficient acceleration. The reconfigurability of FPGAs is appealing due to the rapid change in DL models but also causes lower performance and area-efficiency compared to ASICs. In this article, we implement three state-of-the-art computing architectures (CAs) for convolutional neural network (CNN) inference on FPGAs and ASICs. By comparing the FPGA and ASIC implementations, we highlight the area and performance costs of programmability to pinpoint the inefficiencies in current FPGA architectures. We perform our experiments using three variations of these CAs for AlexNet, VGG-16 and ResNet-50 to allow extensive comparisons. We find that the performance gap varies significantly from 2.8× to 6.3×, while the area gap is consistent across CAs with an 8.7 average FPGA-to-ASIC area ratio. Among different blocks of the CAs, the convolution engine, constituting up to 60% of the total area, has a high area ratio ranging from 13 to 31. Motivated by our FPGA vs. ASIC comparisons, we suggest FPGA architectural changes such as increasing DSP block count, enhancing low-precision support in DSP blocks and rethinking the on-chip memories to reduce the programmability gap for DL applications.","PeriodicalId":162787,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"You Cannot Improve What You Do not Measure\",\"authors\":\"Andrew Boutros, S. Yazdanshenas, Vaughn Betz\",\"doi\":\"10.1145/3242898\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, deep learning (DL) has become best-in-class for numerous applications but at a high computational cost that necessitates high-performance energy-efficient acceleration. The reconfigurability of FPGAs is appealing due to the rapid change in DL models but also causes lower performance and area-efficiency compared to ASICs. In this article, we implement three state-of-the-art computing architectures (CAs) for convolutional neural network (CNN) inference on FPGAs and ASICs. By comparing the FPGA and ASIC implementations, we highlight the area and performance costs of programmability to pinpoint the inefficiencies in current FPGA architectures. We perform our experiments using three variations of these CAs for AlexNet, VGG-16 and ResNet-50 to allow extensive comparisons. We find that the performance gap varies significantly from 2.8× to 6.3×, while the area gap is consistent across CAs with an 8.7 average FPGA-to-ASIC area ratio. Among different blocks of the CAs, the convolution engine, constituting up to 60% of the total area, has a high area ratio ranging from 13 to 31. Motivated by our FPGA vs. ASIC comparisons, we suggest FPGA architectural changes such as increasing DSP block count, enhancing low-precision support in DSP blocks and rethinking the on-chip memories to reduce the programmability gap for DL applications.\",\"PeriodicalId\":162787,\"journal\":{\"name\":\"ACM Transactions on Reconfigurable Technology and Systems (TRETS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Reconfigurable Technology and Systems (TRETS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3242898\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3242898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

摘要

最近，深度学习(DL)在许多应用中已经成为一流的，但其计算成本很高，需要高性能节能加速。由于DL模型的快速变化，fpga的可重构性很有吸引力，但与asic相比，它的性能和面积效率也较低。在本文中，我们在fpga和asic上实现了卷积神经网络(CNN)推理的三种最先进的计算架构(ca)。通过比较FPGA和ASIC实现，我们强调了可编程性的面积和性能成本，以指出当前FPGA架构中的低效率。我们使用AlexNet, VGG-16和ResNet-50的三种ca变体进行实验，以进行广泛的比较。我们发现性能差距在2.8×到6.3×之间变化显著，而面积差距在ca之间是一致的，平均fpga与asic的面积比为8.7。在ca的不同块中，卷积引擎的面积比高达13 - 31，占总面积的60%。在FPGA与ASIC比较的激励下，我们建议FPGA架构变化，如增加DSP块数，增强DSP块中的低精度支持，以及重新考虑片上存储器以减少DL应用的可编程性差距。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

You Cannot Improve What You Do not Measure

Recently, deep learning (DL) has become best-in-class for numerous applications but at a high computational cost that necessitates high-performance energy-efficient acceleration. The reconfigurability of FPGAs is appealing due to the rapid change in DL models but also causes lower performance and area-efficiency compared to ASICs. In this article, we implement three state-of-the-art computing architectures (CAs) for convolutional neural network (CNN) inference on FPGAs and ASICs. By comparing the FPGA and ASIC implementations, we highlight the area and performance costs of programmability to pinpoint the inefficiencies in current FPGA architectures. We perform our experiments using three variations of these CAs for AlexNet, VGG-16 and ResNet-50 to allow extensive comparisons. We find that the performance gap varies significantly from 2.8× to 6.3×, while the area gap is consistent across CAs with an 8.7 average FPGA-to-ASIC area ratio. Among different blocks of the CAs, the convolution engine, constituting up to 60% of the total area, has a high area ratio ranging from 13 to 31. Motivated by our FPGA vs. ASIC comparisons, we suggest FPGA architectural changes such as increasing DSP block count, enhancing low-precision support in DSP blocks and rethinking the on-chip memories to reduce the programmability gap for DL applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Reconfigurable Technology and Systems (TRETS)

自引率

0.00%

发文量