Inf4Edge:用于边缘嵌入式fpga的自动资源感知节能CNN推理加速器

2021 12th International Green and Sustainable Computing Conference (IGSC) Pub Date : 2021-10-18 DOI:10.1109/IGSC54211.2021.9651650

Ali Jahanshahi, Rasool Sharifi, Mohammadreza Rezvani, Hadi Zamani

{"title":"Inf4Edge:用于边缘嵌入式fpga的自动资源感知节能CNN推理加速器","authors":"Ali Jahanshahi, Rasool Sharifi, Mohammadreza Rezvani, Hadi Zamani","doi":"10.1109/IGSC54211.2021.9651650","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks (CNN) have achieved great success in a large number of applications and have been among the most powerful and widely used techniques in computer vision. CNN inference is very computation-intensive which makes it difficult to be integrated into resource-constrained embedded devices such as smart phones, smart glasses, and robots. Along side inference latency, energy-efficiency is also of great importance when it comes to embedded devices with limited computational, storage, and energy resources. Embedded FPGAs, as a fast and energy-efficient solution, are one of widely used platforms for accelerating CNN inference. However, the difficulty of programming and their limited hardware resources have made them a less attractive option to the users. In this paper, we propose Inf4Edge, an automated framework for designing CNN inference accelerator on small embedded FPGAs. The proposed framework seamlessly generates a CNNs inference accelerator that fits the target FPGA using different resource-aware optimization techniques. We eliminate the overhead of transferring the data to/from FPGA back and forth which introduces latency and energy consumption. To avoid the data transfer overhead, we keep all of the data on the FPGA on-chip memory which makes the generated inference accelerator faster and more energy-efficient. Given a high-level description of the CNN and a data set, the framework builds and trains the model, and generates an optimized CNN inference accelerator for the target FPGA. As a case study, we use 16-bit fixed-point data in the generated CNN inference accelerator on a small FPGA and compare it to the same software model running on the FPGA's ARM processor. Using 16-bit fixed-point data type results in ~ 2% accuracy loss in the CNN inference accelerator. In return, we get up to $15.86\\times$ speedup performing inference on the FPGA.","PeriodicalId":334989,"journal":{"name":"2021 12th International Green and Sustainable Computing Conference (IGSC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Inf4Edge: Automatic Resource-aware Generation of Energy-efficient CNN Inference Accelerator for Edge Embedded FPGAs\",\"authors\":\"Ali Jahanshahi, Rasool Sharifi, Mohammadreza Rezvani, Hadi Zamani\",\"doi\":\"10.1109/IGSC54211.2021.9651650\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional Neural Networks (CNN) have achieved great success in a large number of applications and have been among the most powerful and widely used techniques in computer vision. CNN inference is very computation-intensive which makes it difficult to be integrated into resource-constrained embedded devices such as smart phones, smart glasses, and robots. Along side inference latency, energy-efficiency is also of great importance when it comes to embedded devices with limited computational, storage, and energy resources. Embedded FPGAs, as a fast and energy-efficient solution, are one of widely used platforms for accelerating CNN inference. However, the difficulty of programming and their limited hardware resources have made them a less attractive option to the users. In this paper, we propose Inf4Edge, an automated framework for designing CNN inference accelerator on small embedded FPGAs. The proposed framework seamlessly generates a CNNs inference accelerator that fits the target FPGA using different resource-aware optimization techniques. We eliminate the overhead of transferring the data to/from FPGA back and forth which introduces latency and energy consumption. To avoid the data transfer overhead, we keep all of the data on the FPGA on-chip memory which makes the generated inference accelerator faster and more energy-efficient. Given a high-level description of the CNN and a data set, the framework builds and trains the model, and generates an optimized CNN inference accelerator for the target FPGA. As a case study, we use 16-bit fixed-point data in the generated CNN inference accelerator on a small FPGA and compare it to the same software model running on the FPGA's ARM processor. Using 16-bit fixed-point data type results in ~ 2% accuracy loss in the CNN inference accelerator. In return, we get up to $15.86\\\\times$ speedup performing inference on the FPGA.\",\"PeriodicalId\":334989,\"journal\":{\"name\":\"2021 12th International Green and Sustainable Computing Conference (IGSC)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 12th International Green and Sustainable Computing Conference (IGSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IGSC54211.2021.9651650\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Green and Sustainable Computing Conference (IGSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IGSC54211.2021.9651650","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

卷积神经网络(CNN)在大量的应用中取得了巨大的成功，是计算机视觉领域最强大、应用最广泛的技术之一。CNN推理是非常计算密集型的，这使得它很难集成到资源受限的嵌入式设备中，如智能手机、智能眼镜和机器人。除了推理延迟之外，当涉及到计算、存储和能源资源有限的嵌入式设备时，能效也非常重要。嵌入式fpga作为一种快速、节能的解决方案，是目前广泛使用的加速CNN推理的平台之一。然而，编程的困难和它们有限的硬件资源使它们对用户的吸引力降低。在本文中，我们提出了一个在小型嵌入式fpga上设计CNN推理加速器的自动化框架Inf4Edge。该框架使用不同的资源感知优化技术无缝地生成适合目标FPGA的cnn推理加速器。我们消除了在FPGA之间来回传输数据的开销，这会带来延迟和能耗。为了避免数据传输开销，我们将所有数据保存在FPGA片上存储器上，从而使生成的推理加速器更快，更节能。给定CNN的高级描述和数据集，该框架构建和训练模型，并为目标FPGA生成优化的CNN推理加速器。作为一个案例研究，我们在一个小型FPGA上生成的CNN推理加速器中使用16位定点数据，并将其与在FPGA的ARM处理器上运行的相同软件模型进行比较。使用16位定点数据类型导致CNN推理加速器精度损失约2%。作为回报，我们在FPGA上执行推理的速度提高了15.86倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Inf4Edge: Automatic Resource-aware Generation of Energy-efficient CNN Inference Accelerator for Edge Embedded FPGAs

Convolutional Neural Networks (CNN) have achieved great success in a large number of applications and have been among the most powerful and widely used techniques in computer vision. CNN inference is very computation-intensive which makes it difficult to be integrated into resource-constrained embedded devices such as smart phones, smart glasses, and robots. Along side inference latency, energy-efficiency is also of great importance when it comes to embedded devices with limited computational, storage, and energy resources. Embedded FPGAs, as a fast and energy-efficient solution, are one of widely used platforms for accelerating CNN inference. However, the difficulty of programming and their limited hardware resources have made them a less attractive option to the users. In this paper, we propose Inf4Edge, an automated framework for designing CNN inference accelerator on small embedded FPGAs. The proposed framework seamlessly generates a CNNs inference accelerator that fits the target FPGA using different resource-aware optimization techniques. We eliminate the overhead of transferring the data to/from FPGA back and forth which introduces latency and energy consumption. To avoid the data transfer overhead, we keep all of the data on the FPGA on-chip memory which makes the generated inference accelerator faster and more energy-efficient. Given a high-level description of the CNN and a data set, the framework builds and trains the model, and generates an optimized CNN inference accelerator for the target FPGA. As a case study, we use 16-bit fixed-point data in the generated CNN inference accelerator on a small FPGA and compare it to the same software model running on the FPGA's ARM processor. Using 16-bit fixed-point data type results in ~ 2% accuracy loss in the CNN inference accelerator. In return, we get up to $15.86\times$ speedup performing inference on the FPGA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 12th International Green and Sustainable Computing Conference (IGSC)

自引率

0.00%

发文量