{"title":"Benchmarking TensorFlow Lite Quantization Algorithms for Deep Neural Networks","authors":"Ioan Lucan Orășan, Ciprian Seiculescu, C. Căleanu","doi":"10.1109/SACI55618.2022.9919465","DOIUrl":null,"url":null,"abstract":"Deploying deep neural network models on the resource constrained devices, e.g., lost-cost microcontrollers, is challenging because they are mostly limited in terms of memory footprint and computation capabilities. Quantization is one of the widely used solutions to reduce the size of a model. For parameter representation, it employs for example just 8-bit integer or less instead of 32-bit floating point. The TensorFlow Lite deep learning framework currently provides four methods for post-training quantization. The aim of this paper is to benchmark these quantization methods using various deep neural models of different sizes. The main outcomes of the paper are: (1) the compression ratio obtained for each quantization method for deep neural models of small, medium, and large sizes, (2) a comparison of the accuracy results relative to the original accuracy, and (3) a viewpoint for the decision to choose the quantization method depending on the model size.","PeriodicalId":105691,"journal":{"name":"2022 IEEE 16th International Symposium on Applied Computational Intelligence and Informatics (SACI)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 16th International Symposium on Applied Computational Intelligence and Informatics (SACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SACI55618.2022.9919465","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Deploying deep neural network models on the resource constrained devices, e.g., lost-cost microcontrollers, is challenging because they are mostly limited in terms of memory footprint and computation capabilities. Quantization is one of the widely used solutions to reduce the size of a model. For parameter representation, it employs for example just 8-bit integer or less instead of 32-bit floating point. The TensorFlow Lite deep learning framework currently provides four methods for post-training quantization. The aim of this paper is to benchmark these quantization methods using various deep neural models of different sizes. The main outcomes of the paper are: (1) the compression ratio obtained for each quantization method for deep neural models of small, medium, and large sizes, (2) a comparison of the accuracy results relative to the original accuracy, and (3) a viewpoint for the decision to choose the quantization method depending on the model size.