基于OpenVINO的异构架构下深度学习推理模型加速分析

2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT) Pub Date : 2021-12-06 DOI:10.1109/ISAECT53699.2021.9668607

Fatima Zahra Guerrouj, Mohamed Abouzahir, M. Ramzi, E. Abdali

{"title":"基于OpenVINO的异构架构下深度学习推理模型加速分析","authors":"Fatima Zahra Guerrouj, Mohamed Abouzahir, M. Ramzi, E. Abdali","doi":"10.1109/ISAECT53699.2021.9668607","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNN) are a very powerful tool for many different applications. This capability is highly demanded in the field of embedded systems for video surveillance, speech recognition, and image analysis. Due to its high computational intensity, the application of CNN is limited to real-time research areas where computational speed is extremely important. Therefore, an appropriate accelerator is required to fulfill the requirements of these limitations. On the one hand, GPUs are widely used to accelerate the CNN under high power dissipation. On the other hand, the trend for FPGA implementation is increasing rapidly due to its low power consumption and facile re-configurability. In this work, we evaluate the inference performance of 10 classification models and 9 object detection models using the OpenVINO toolkit. In addition, we analyzed the implementation of these models on the DE5a-Net DDR4 equipped with an Arria 10 GX FPGA. The results show that the performance of Full-Precision FP32 classification models on a heterogeneous architecture FPGA/CPU is on average 3.6X faster than the CPU.","PeriodicalId":137636,"journal":{"name":"2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analysis of the acceleration of deep learning inference models on a heterogeneous architecture based on OpenVINO\",\"authors\":\"Fatima Zahra Guerrouj, Mohamed Abouzahir, M. Ramzi, E. Abdali\",\"doi\":\"10.1109/ISAECT53699.2021.9668607\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional neural networks (CNN) are a very powerful tool for many different applications. This capability is highly demanded in the field of embedded systems for video surveillance, speech recognition, and image analysis. Due to its high computational intensity, the application of CNN is limited to real-time research areas where computational speed is extremely important. Therefore, an appropriate accelerator is required to fulfill the requirements of these limitations. On the one hand, GPUs are widely used to accelerate the CNN under high power dissipation. On the other hand, the trend for FPGA implementation is increasing rapidly due to its low power consumption and facile re-configurability. In this work, we evaluate the inference performance of 10 classification models and 9 object detection models using the OpenVINO toolkit. In addition, we analyzed the implementation of these models on the DE5a-Net DDR4 equipped with an Arria 10 GX FPGA. The results show that the performance of Full-Precision FP32 classification models on a heterogeneous architecture FPGA/CPU is on average 3.6X faster than the CPU.\",\"PeriodicalId\":137636,\"journal\":{\"name\":\"2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISAECT53699.2021.9668607\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISAECT53699.2021.9668607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

卷积神经网络(CNN)对于许多不同的应用来说是一个非常强大的工具。这种能力在视频监控、语音识别和图像分析的嵌入式系统领域是非常需要的。由于计算强度高，CNN的应用仅限于对计算速度极其重要的实时研究领域。因此，需要适当的加速器来满足这些限制的要求。一方面，gpu被广泛用于在高功耗下加速CNN。另一方面，由于FPGA的低功耗和易于重新配置，其实现的趋势正在迅速增加。在这项工作中，我们使用OpenVINO工具包评估了10个分类模型和9个目标检测模型的推理性能。此外，我们还分析了这些模型在配备Arria 10 GX FPGA的DE5a-Net DDR4上的实现。结果表明，在异构架构FPGA/CPU上，全精度FP32分类模型的性能平均比CPU快3.6倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Analysis of the acceleration of deep learning inference models on a heterogeneous architecture based on OpenVINO

Convolutional neural networks (CNN) are a very powerful tool for many different applications. This capability is highly demanded in the field of embedded systems for video surveillance, speech recognition, and image analysis. Due to its high computational intensity, the application of CNN is limited to real-time research areas where computational speed is extremely important. Therefore, an appropriate accelerator is required to fulfill the requirements of these limitations. On the one hand, GPUs are widely used to accelerate the CNN under high power dissipation. On the other hand, the trend for FPGA implementation is increasing rapidly due to its low power consumption and facile re-configurability. In this work, we evaluate the inference performance of 10 classification models and 9 object detection models using the OpenVINO toolkit. In addition, we analyzed the implementation of these models on the DE5a-Net DDR4 equipped with an Arria 10 GX FPGA. The results show that the performance of Full-Precision FP32 classification models on a heterogeneous architecture FPGA/CPU is on average 3.6X faster than the CPU.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT)

自引率

0.00%

发文量