Fatima Zahra Guerrouj, Mohamed Abouzahir, M. Ramzi, E. Abdali
{"title":"基于OpenVINO的异构架构下深度学习推理模型加速分析","authors":"Fatima Zahra Guerrouj, Mohamed Abouzahir, M. Ramzi, E. Abdali","doi":"10.1109/ISAECT53699.2021.9668607","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNN) are a very powerful tool for many different applications. This capability is highly demanded in the field of embedded systems for video surveillance, speech recognition, and image analysis. Due to its high computational intensity, the application of CNN is limited to real-time research areas where computational speed is extremely important. Therefore, an appropriate accelerator is required to fulfill the requirements of these limitations. On the one hand, GPUs are widely used to accelerate the CNN under high power dissipation. On the other hand, the trend for FPGA implementation is increasing rapidly due to its low power consumption and facile re-configurability. In this work, we evaluate the inference performance of 10 classification models and 9 object detection models using the OpenVINO toolkit. In addition, we analyzed the implementation of these models on the DE5a-Net DDR4 equipped with an Arria 10 GX FPGA. The results show that the performance of Full-Precision FP32 classification models on a heterogeneous architecture FPGA/CPU is on average 3.6X faster than the CPU.","PeriodicalId":137636,"journal":{"name":"2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analysis of the acceleration of deep learning inference models on a heterogeneous architecture based on OpenVINO\",\"authors\":\"Fatima Zahra Guerrouj, Mohamed Abouzahir, M. Ramzi, E. Abdali\",\"doi\":\"10.1109/ISAECT53699.2021.9668607\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional neural networks (CNN) are a very powerful tool for many different applications. This capability is highly demanded in the field of embedded systems for video surveillance, speech recognition, and image analysis. Due to its high computational intensity, the application of CNN is limited to real-time research areas where computational speed is extremely important. Therefore, an appropriate accelerator is required to fulfill the requirements of these limitations. On the one hand, GPUs are widely used to accelerate the CNN under high power dissipation. On the other hand, the trend for FPGA implementation is increasing rapidly due to its low power consumption and facile re-configurability. In this work, we evaluate the inference performance of 10 classification models and 9 object detection models using the OpenVINO toolkit. In addition, we analyzed the implementation of these models on the DE5a-Net DDR4 equipped with an Arria 10 GX FPGA. The results show that the performance of Full-Precision FP32 classification models on a heterogeneous architecture FPGA/CPU is on average 3.6X faster than the CPU.\",\"PeriodicalId\":137636,\"journal\":{\"name\":\"2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISAECT53699.2021.9668607\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISAECT53699.2021.9668607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysis of the acceleration of deep learning inference models on a heterogeneous architecture based on OpenVINO
Convolutional neural networks (CNN) are a very powerful tool for many different applications. This capability is highly demanded in the field of embedded systems for video surveillance, speech recognition, and image analysis. Due to its high computational intensity, the application of CNN is limited to real-time research areas where computational speed is extremely important. Therefore, an appropriate accelerator is required to fulfill the requirements of these limitations. On the one hand, GPUs are widely used to accelerate the CNN under high power dissipation. On the other hand, the trend for FPGA implementation is increasing rapidly due to its low power consumption and facile re-configurability. In this work, we evaluate the inference performance of 10 classification models and 9 object detection models using the OpenVINO toolkit. In addition, we analyzed the implementation of these models on the DE5a-Net DDR4 equipped with an Arria 10 GX FPGA. The results show that the performance of Full-Precision FP32 classification models on a heterogeneous architecture FPGA/CPU is on average 3.6X faster than the CPU.