{"title":"Deploying Pre-Quantized Deep Learning Models on Heterogeneous Platforms with Operator Flow Recognition and Quantization Parameter Optimization","authors":"Kuen-Wey Lin, Yan-Ying Li, Kuan Wang, Ming-Chih Tung","doi":"10.1109/ICASI57738.2023.10179562","DOIUrl":null,"url":null,"abstract":"Quantized deep learning models are suitable for the embedded devices with limited computation resource. For computation-intensive neural network operators such as convolution, heterogeneous platforms with a set of processing units of different types become common in the embedded devices. These embedded devices usually operate on fixed-point calculations; moreover, they rely on customized kernel functions to deploy deep learning models. In this paper, a flow of deploying pre-quantized deep learning models on heterogeneous platforms using TVM is presented. We propose an optimization to convert quantization parameters. To leverage customized kernel functions, we propose the operator flow recognition. To demonstrate our flow, we utilize embARC Machine Learning Inference (embARC MLI), an open-source software library targeted for low-power applications. A set of pre-quantized deep learning models are deployed on a heterogeneous platform comprising x86 and embARC MLI. Experimental results show that for each model, the accuracy obtained from the heterogeneous platform is much the same as the one obtained from an x86 platform.","PeriodicalId":281254,"journal":{"name":"2023 9th International Conference on Applied System Innovation (ICASI)","volume":"106 5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 9th International Conference on Applied System Innovation (ICASI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASI57738.2023.10179562","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Quantized deep learning models are suitable for the embedded devices with limited computation resource. For computation-intensive neural network operators such as convolution, heterogeneous platforms with a set of processing units of different types become common in the embedded devices. These embedded devices usually operate on fixed-point calculations; moreover, they rely on customized kernel functions to deploy deep learning models. In this paper, a flow of deploying pre-quantized deep learning models on heterogeneous platforms using TVM is presented. We propose an optimization to convert quantization parameters. To leverage customized kernel functions, we propose the operator flow recognition. To demonstrate our flow, we utilize embARC Machine Learning Inference (embARC MLI), an open-source software library targeted for low-power applications. A set of pre-quantized deep learning models are deployed on a heterogeneous platform comprising x86 and embARC MLI. Experimental results show that for each model, the accuracy obtained from the heterogeneous platform is much the same as the one obtained from an x86 platform.