{"title":"Multi-Model Inference Acceleration on Embedded Multi-Core Processors","authors":"Peiqi Shi, Feng Gao, Songtao Liang, Shanjin Yu","doi":"10.1109/ICHCI51889.2020.00090","DOIUrl":null,"url":null,"abstract":"The predominant resource efficient approaches that enable on-device inference include designing lightweight DNN architectures like MobileNets, SqueezeNets, compressing model using techniques such as network pruning, vector quantization, distillation, binarization. Recent research on using dynamic layer-wise partitioning and partial execution of CNN based model inference also make it possible to co-inference on memory and computation resource constrained devices. However, these approaches have their own bottleneck, lightweight DNN architectures and model compression usually compromise accuracy in order to deploy on resource constrained devices, dynamic model partitioning efficiency depends heavily on the network condition. This paper proposes an approach for multimodel inference acceleration on heterogeneous devices. The idea is to deploy multiple single object detection model instead of one heavy multiple object, this is because in most cases it only needs to detect one or two objects in one scenario and single object detection model weight could be lighter for the same resolution quality and require less resource. Moreover, in cloud-edge-device scenario, with the help of a scheduler policy, it is possible to gradually update models in need.","PeriodicalId":355427,"journal":{"name":"2020 International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICHCI51889.2020.00090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The predominant resource efficient approaches that enable on-device inference include designing lightweight DNN architectures like MobileNets, SqueezeNets, compressing model using techniques such as network pruning, vector quantization, distillation, binarization. Recent research on using dynamic layer-wise partitioning and partial execution of CNN based model inference also make it possible to co-inference on memory and computation resource constrained devices. However, these approaches have their own bottleneck, lightweight DNN architectures and model compression usually compromise accuracy in order to deploy on resource constrained devices, dynamic model partitioning efficiency depends heavily on the network condition. This paper proposes an approach for multimodel inference acceleration on heterogeneous devices. The idea is to deploy multiple single object detection model instead of one heavy multiple object, this is because in most cases it only needs to detect one or two objects in one scenario and single object detection model weight could be lighter for the same resolution quality and require less resource. Moreover, in cloud-edge-device scenario, with the help of a scheduler policy, it is possible to gradually update models in need.