{"title":"A Heterogeneous Full-stack AI Platform for Performance Monitoring and Hardware-specific Optimizations","authors":"Zikang Zhou, Chao-ying Fu, Ruiqi Xie, Jun Han","doi":"10.1109/MCSoC51149.2021.00032","DOIUrl":null,"url":null,"abstract":"Many hardware accelerators are proposed to accelerate the computation of DNN to meet the real-time application. However, constrained by the microarchitecture of accelerators, the same neural network generally will have huge performance differences when deployed on different accelerators. It forces the network designers to rethink the network structure from a hardware view. Such a designed effort is more likely to achieve better performance on the targeted accelerator. In this paper, in order to explore hardware-specific optimizations, we designed a full-stack heterogeneous evaluation platform based on the open-source neural network accelerator NVDLA and TVM with a monitoring function. This evaluation platform integrates two processors with instruction sets of Arm and RISC-V and a DNN accelerator, and DNNs under common frameworks (Pytorch, Keras, ONNX, etc.) can be deployed on the platform to analyze its adaptability to the hardware through a simple process. Based on the platform, we conduct some experiments to demonstrate how can neural network affect the performance of specific hardware design. The experimental results show that the unsuited structure of the neural networks will cause additional data transfer on the target hardware, which is the main source of performance and energy degradation. The order of network operators, the width and depth of networks, and the number of operations that are unsupported by accelerators will all affect the performance of the network on specific accelerators. Designers should do some targeted optimizations toward specific hardware deployment and NAS (Network Automatic Search) should consider these factors.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC51149.2021.00032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Many hardware accelerators are proposed to accelerate the computation of DNN to meet the real-time application. However, constrained by the microarchitecture of accelerators, the same neural network generally will have huge performance differences when deployed on different accelerators. It forces the network designers to rethink the network structure from a hardware view. Such a designed effort is more likely to achieve better performance on the targeted accelerator. In this paper, in order to explore hardware-specific optimizations, we designed a full-stack heterogeneous evaluation platform based on the open-source neural network accelerator NVDLA and TVM with a monitoring function. This evaluation platform integrates two processors with instruction sets of Arm and RISC-V and a DNN accelerator, and DNNs under common frameworks (Pytorch, Keras, ONNX, etc.) can be deployed on the platform to analyze its adaptability to the hardware through a simple process. Based on the platform, we conduct some experiments to demonstrate how can neural network affect the performance of specific hardware design. The experimental results show that the unsuited structure of the neural networks will cause additional data transfer on the target hardware, which is the main source of performance and energy degradation. The order of network operators, the width and depth of networks, and the number of operations that are unsupported by accelerators will all affect the performance of the network on specific accelerators. Designers should do some targeted optimizations toward specific hardware deployment and NAS (Network Automatic Search) should consider these factors.