嵌入式多核处理器的多模型推理加速

Peiqi Shi, Feng Gao, Songtao Liang, Shanjin Yu
{"title":"嵌入式多核处理器的多模型推理加速","authors":"Peiqi Shi, Feng Gao, Songtao Liang, Shanjin Yu","doi":"10.1109/ICHCI51889.2020.00090","DOIUrl":null,"url":null,"abstract":"The predominant resource efficient approaches that enable on-device inference include designing lightweight DNN architectures like MobileNets, SqueezeNets, compressing model using techniques such as network pruning, vector quantization, distillation, binarization. Recent research on using dynamic layer-wise partitioning and partial execution of CNN based model inference also make it possible to co-inference on memory and computation resource constrained devices. However, these approaches have their own bottleneck, lightweight DNN architectures and model compression usually compromise accuracy in order to deploy on resource constrained devices, dynamic model partitioning efficiency depends heavily on the network condition. This paper proposes an approach for multimodel inference acceleration on heterogeneous devices. The idea is to deploy multiple single object detection model instead of one heavy multiple object, this is because in most cases it only needs to detect one or two objects in one scenario and single object detection model weight could be lighter for the same resolution quality and require less resource. Moreover, in cloud-edge-device scenario, with the help of a scheduler policy, it is possible to gradually update models in need.","PeriodicalId":355427,"journal":{"name":"2020 International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Multi-Model Inference Acceleration on Embedded Multi-Core Processors\",\"authors\":\"Peiqi Shi, Feng Gao, Songtao Liang, Shanjin Yu\",\"doi\":\"10.1109/ICHCI51889.2020.00090\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The predominant resource efficient approaches that enable on-device inference include designing lightweight DNN architectures like MobileNets, SqueezeNets, compressing model using techniques such as network pruning, vector quantization, distillation, binarization. Recent research on using dynamic layer-wise partitioning and partial execution of CNN based model inference also make it possible to co-inference on memory and computation resource constrained devices. However, these approaches have their own bottleneck, lightweight DNN architectures and model compression usually compromise accuracy in order to deploy on resource constrained devices, dynamic model partitioning efficiency depends heavily on the network condition. This paper proposes an approach for multimodel inference acceleration on heterogeneous devices. The idea is to deploy multiple single object detection model instead of one heavy multiple object, this is because in most cases it only needs to detect one or two objects in one scenario and single object detection model weight could be lighter for the same resolution quality and require less resource. Moreover, in cloud-edge-device scenario, with the help of a scheduler policy, it is possible to gradually update models in need.\",\"PeriodicalId\":355427,\"journal\":{\"name\":\"2020 International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICHCI51889.2020.00090\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICHCI51889.2020.00090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

实现设备上推理的主要资源高效方法包括设计轻量级DNN架构,如MobileNets、SqueezeNets,使用网络修剪、矢量量化、蒸馏、二值化等技术压缩模型。最近对基于CNN模型推理的动态分层划分和部分执行的研究也使得在内存和计算资源受限的设备上进行协同推理成为可能。然而,这些方法都有自己的瓶颈,轻量级DNN架构和模型压缩通常会损害准确性,以便在资源受限的设备上部署,动态模型划分效率严重依赖于网络条件。提出了一种在异构设备上实现多模型推理加速的方法。我们的想法是部署多个单目标检测模型,而不是一个重的多目标,这是因为在大多数情况下,它只需要在一个场景中检测一个或两个对象,而在相同的分辨率质量下,单目标检测模型的权重可能更轻,需要的资源更少。此外,在云边缘设备场景中,借助调度器策略,可以逐步更新需要的模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-Model Inference Acceleration on Embedded Multi-Core Processors
The predominant resource efficient approaches that enable on-device inference include designing lightweight DNN architectures like MobileNets, SqueezeNets, compressing model using techniques such as network pruning, vector quantization, distillation, binarization. Recent research on using dynamic layer-wise partitioning and partial execution of CNN based model inference also make it possible to co-inference on memory and computation resource constrained devices. However, these approaches have their own bottleneck, lightweight DNN architectures and model compression usually compromise accuracy in order to deploy on resource constrained devices, dynamic model partitioning efficiency depends heavily on the network condition. This paper proposes an approach for multimodel inference acceleration on heterogeneous devices. The idea is to deploy multiple single object detection model instead of one heavy multiple object, this is because in most cases it only needs to detect one or two objects in one scenario and single object detection model weight could be lighter for the same resolution quality and require less resource. Moreover, in cloud-edge-device scenario, with the help of a scheduler policy, it is possible to gradually update models in need.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信