Multi-Model Inference Acceleration on Embedded Multi-Core Processors

2020 International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI) Pub Date : 2020-12-01 DOI:10.1109/ICHCI51889.2020.00090

Peiqi Shi, Feng Gao, Songtao Liang, Shanjin Yu

{"title":"Multi-Model Inference Acceleration on Embedded Multi-Core Processors","authors":"Peiqi Shi, Feng Gao, Songtao Liang, Shanjin Yu","doi":"10.1109/ICHCI51889.2020.00090","DOIUrl":null,"url":null,"abstract":"The predominant resource efficient approaches that enable on-device inference include designing lightweight DNN architectures like MobileNets, SqueezeNets, compressing model using techniques such as network pruning, vector quantization, distillation, binarization. Recent research on using dynamic layer-wise partitioning and partial execution of CNN based model inference also make it possible to co-inference on memory and computation resource constrained devices. However, these approaches have their own bottleneck, lightweight DNN architectures and model compression usually compromise accuracy in order to deploy on resource constrained devices, dynamic model partitioning efficiency depends heavily on the network condition. This paper proposes an approach for multimodel inference acceleration on heterogeneous devices. The idea is to deploy multiple single object detection model instead of one heavy multiple object, this is because in most cases it only needs to detect one or two objects in one scenario and single object detection model weight could be lighter for the same resolution quality and require less resource. Moreover, in cloud-edge-device scenario, with the help of a scheduler policy, it is possible to gradually update models in need.","PeriodicalId":355427,"journal":{"name":"2020 International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICHCI51889.2020.00090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The predominant resource efficient approaches that enable on-device inference include designing lightweight DNN architectures like MobileNets, SqueezeNets, compressing model using techniques such as network pruning, vector quantization, distillation, binarization. Recent research on using dynamic layer-wise partitioning and partial execution of CNN based model inference also make it possible to co-inference on memory and computation resource constrained devices. However, these approaches have their own bottleneck, lightweight DNN architectures and model compression usually compromise accuracy in order to deploy on resource constrained devices, dynamic model partitioning efficiency depends heavily on the network condition. This paper proposes an approach for multimodel inference acceleration on heterogeneous devices. The idea is to deploy multiple single object detection model instead of one heavy multiple object, this is because in most cases it only needs to detect one or two objects in one scenario and single object detection model weight could be lighter for the same resolution quality and require less resource. Moreover, in cloud-edge-device scenario, with the help of a scheduler policy, it is possible to gradually update models in need.

查看原文本刊更多论文

嵌入式多核处理器的多模型推理加速

实现设备上推理的主要资源高效方法包括设计轻量级DNN架构，如MobileNets、SqueezeNets，使用网络修剪、矢量量化、蒸馏、二值化等技术压缩模型。最近对基于CNN模型推理的动态分层划分和部分执行的研究也使得在内存和计算资源受限的设备上进行协同推理成为可能。然而，这些方法都有自己的瓶颈，轻量级DNN架构和模型压缩通常会损害准确性，以便在资源受限的设备上部署，动态模型划分效率严重依赖于网络条件。提出了一种在异构设备上实现多模型推理加速的方法。我们的想法是部署多个单目标检测模型，而不是一个重的多目标，这是因为在大多数情况下，它只需要在一个场景中检测一个或两个对象，而在相同的分辨率质量下，单目标检测模型的权重可能更轻，需要的资源更少。此外，在云边缘设备场景中，借助调度器策略，可以逐步更新需要的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI)

自引率

0.00%

发文量