Fatima Zahra Guerrouj, Mohamed Abouzahir, M. Ramzi, E. Abdali
{"title":"Analysis of the acceleration of deep learning inference models on a heterogeneous architecture based on OpenVINO","authors":"Fatima Zahra Guerrouj, Mohamed Abouzahir, M. Ramzi, E. Abdali","doi":"10.1109/ISAECT53699.2021.9668607","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNN) are a very powerful tool for many different applications. This capability is highly demanded in the field of embedded systems for video surveillance, speech recognition, and image analysis. Due to its high computational intensity, the application of CNN is limited to real-time research areas where computational speed is extremely important. Therefore, an appropriate accelerator is required to fulfill the requirements of these limitations. On the one hand, GPUs are widely used to accelerate the CNN under high power dissipation. On the other hand, the trend for FPGA implementation is increasing rapidly due to its low power consumption and facile re-configurability. In this work, we evaluate the inference performance of 10 classification models and 9 object detection models using the OpenVINO toolkit. In addition, we analyzed the implementation of these models on the DE5a-Net DDR4 equipped with an Arria 10 GX FPGA. The results show that the performance of Full-Precision FP32 classification models on a heterogeneous architecture FPGA/CPU is on average 3.6X faster than the CPU.","PeriodicalId":137636,"journal":{"name":"2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISAECT53699.2021.9668607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Convolutional neural networks (CNN) are a very powerful tool for many different applications. This capability is highly demanded in the field of embedded systems for video surveillance, speech recognition, and image analysis. Due to its high computational intensity, the application of CNN is limited to real-time research areas where computational speed is extremely important. Therefore, an appropriate accelerator is required to fulfill the requirements of these limitations. On the one hand, GPUs are widely used to accelerate the CNN under high power dissipation. On the other hand, the trend for FPGA implementation is increasing rapidly due to its low power consumption and facile re-configurability. In this work, we evaluate the inference performance of 10 classification models and 9 object detection models using the OpenVINO toolkit. In addition, we analyzed the implementation of these models on the DE5a-Net DDR4 equipped with an Arria 10 GX FPGA. The results show that the performance of Full-Precision FP32 classification models on a heterogeneous architecture FPGA/CPU is on average 3.6X faster than the CPU.