A Heterogeneous Full-stack AI Platform for Performance Monitoring and Hardware-specific Optimizations

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2021-12-01 DOI:10.1109/MCSoC51149.2021.00032

Zikang Zhou, Chao-ying Fu, Ruiqi Xie, Jun Han

{"title":"A Heterogeneous Full-stack AI Platform for Performance Monitoring and Hardware-specific Optimizations","authors":"Zikang Zhou, Chao-ying Fu, Ruiqi Xie, Jun Han","doi":"10.1109/MCSoC51149.2021.00032","DOIUrl":null,"url":null,"abstract":"Many hardware accelerators are proposed to accelerate the computation of DNN to meet the real-time application. However, constrained by the microarchitecture of accelerators, the same neural network generally will have huge performance differences when deployed on different accelerators. It forces the network designers to rethink the network structure from a hardware view. Such a designed effort is more likely to achieve better performance on the targeted accelerator. In this paper, in order to explore hardware-specific optimizations, we designed a full-stack heterogeneous evaluation platform based on the open-source neural network accelerator NVDLA and TVM with a monitoring function. This evaluation platform integrates two processors with instruction sets of Arm and RISC-V and a DNN accelerator, and DNNs under common frameworks (Pytorch, Keras, ONNX, etc.) can be deployed on the platform to analyze its adaptability to the hardware through a simple process. Based on the platform, we conduct some experiments to demonstrate how can neural network affect the performance of specific hardware design. The experimental results show that the unsuited structure of the neural networks will cause additional data transfer on the target hardware, which is the main source of performance and energy degradation. The order of network operators, the width and depth of networks, and the number of operations that are unsupported by accelerators will all affect the performance of the network on specific accelerators. Designers should do some targeted optimizations toward specific hardware deployment and NAS (Network Automatic Search) should consider these factors.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC51149.2021.00032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Many hardware accelerators are proposed to accelerate the computation of DNN to meet the real-time application. However, constrained by the microarchitecture of accelerators, the same neural network generally will have huge performance differences when deployed on different accelerators. It forces the network designers to rethink the network structure from a hardware view. Such a designed effort is more likely to achieve better performance on the targeted accelerator. In this paper, in order to explore hardware-specific optimizations, we designed a full-stack heterogeneous evaluation platform based on the open-source neural network accelerator NVDLA and TVM with a monitoring function. This evaluation platform integrates two processors with instruction sets of Arm and RISC-V and a DNN accelerator, and DNNs under common frameworks (Pytorch, Keras, ONNX, etc.) can be deployed on the platform to analyze its adaptability to the hardware through a simple process. Based on the platform, we conduct some experiments to demonstrate how can neural network affect the performance of specific hardware design. The experimental results show that the unsuited structure of the neural networks will cause additional data transfer on the target hardware, which is the main source of performance and energy degradation. The order of network operators, the width and depth of networks, and the number of operations that are unsupported by accelerators will all affect the performance of the network on specific accelerators. Designers should do some targeted optimizations toward specific hardware deployment and NAS (Network Automatic Search) should consider these factors.

查看原文本刊更多论文

一个异构全栈AI平台，用于性能监控和特定硬件优化

为了满足深度神经网络的实时应用，提出了许多硬件加速器来加速深度神经网络的计算。然而，受加速器微架构的限制，相同的神经网络在不同的加速器上部署时，通常会有巨大的性能差异。它迫使网络设计者从硬件的角度重新考虑网络结构。这样设计的努力更有可能在目标加速器上获得更好的性能。为了探索针对硬件的优化，本文设计了一个基于开源神经网络加速器NVDLA和具有监控功能的TVM的全栈异构评估平台。该评估平台集成了两个带有Arm和RISC-V指令集的处理器和一个DNN加速器，可以在平台上部署通用框架(Pytorch、Keras、ONNX等)下的DNN，通过一个简单的过程分析其对硬件的适应性。基于该平台，我们进行了一些实验来证明神经网络如何影响特定硬件设计的性能。实验结果表明，不合适的神经网络结构会在目标硬件上产生额外的数据传输，这是导致性能和能量下降的主要原因。网络运营商的顺序、网络的宽度和深度以及加速器不支持的操作数量都会影响网络在特定加速器上的性能。设计人员应该针对特定的硬件部署进行一些有针对性的优化，NAS(网络自动搜索)应该考虑这些因素。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

自引率

0.00%

发文量