EdgeCI: Distributed Workload Assignment and Model Partitioning for CNN Inference on Edge Clusters

IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Yanming Chen, Tong Luo, Weiwei Fang, Neal N. Xiong
{"title":"EdgeCI: Distributed Workload Assignment and Model Partitioning for CNN Inference on Edge Clusters","authors":"Yanming Chen, Tong Luo, Weiwei Fang, Neal N. Xiong","doi":"10.1145/3656041","DOIUrl":null,"url":null,"abstract":"<p>Deep learning technology has grown significantly in new application scenarios such as smart cities and driverless vehicles, but its deployment needs to consume a lot of resources. It is usually difficult to execute inference task solely on resource-constrained Intelligent Internet-of-Things (IoT) devices to meet strictly service delay requirements. CNN-based inference task is usually offloaded to the edge servers or cloud. However, it maybe lead to unstable performance and privacy leaks. To address the above challenges, this paper aims to design a low latency distributed inference framework, EdgeCI, which assigns inference tasks to locally idle, connected and resource-constrained IoT device cluster networks. EdgeCI exploits two key optimization knobs, including: (1) Auction-based Workload Assignment Scheme (AWAS), which achieves the workload balance by assigning each workload partition to the more matching IoT device; (2) Fused-Layer parallelization strategy based on non-recursive Dynamic Programming (DPFL), which is aimed at further minimizing the inference time. We have implemented EdgeCI based on PyTorch and evaluated its performance with VGG-16 and ResNet-34 image recognition models. The experimental results prove that our proposed AWAS and DPFL outperform the typical state-of-the-art solutions. When they are well combined, EdgeCI can improve inference speed by 34.72% to 43.52%. EdgeCI outperforms the state-of-the art approaches on the tested platform.</p>","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"24 1","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Internet Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3656041","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Deep learning technology has grown significantly in new application scenarios such as smart cities and driverless vehicles, but its deployment needs to consume a lot of resources. It is usually difficult to execute inference task solely on resource-constrained Intelligent Internet-of-Things (IoT) devices to meet strictly service delay requirements. CNN-based inference task is usually offloaded to the edge servers or cloud. However, it maybe lead to unstable performance and privacy leaks. To address the above challenges, this paper aims to design a low latency distributed inference framework, EdgeCI, which assigns inference tasks to locally idle, connected and resource-constrained IoT device cluster networks. EdgeCI exploits two key optimization knobs, including: (1) Auction-based Workload Assignment Scheme (AWAS), which achieves the workload balance by assigning each workload partition to the more matching IoT device; (2) Fused-Layer parallelization strategy based on non-recursive Dynamic Programming (DPFL), which is aimed at further minimizing the inference time. We have implemented EdgeCI based on PyTorch and evaluated its performance with VGG-16 and ResNet-34 image recognition models. The experimental results prove that our proposed AWAS and DPFL outperform the typical state-of-the-art solutions. When they are well combined, EdgeCI can improve inference speed by 34.72% to 43.52%. EdgeCI outperforms the state-of-the art approaches on the tested platform.

EdgeCI:边缘集群 CNN 推断的分布式工作量分配和模型划分
深度学习技术在智慧城市和无人驾驶汽车等新应用场景中得到了长足发展,但其部署需要消耗大量资源。通常,仅在资源受限的智能物联网(IoT)设备上执行推理任务很难满足严格的服务延迟要求。基于 CNN 的推理任务通常被卸载到边缘服务器或云端。然而,这可能会导致性能不稳定和隐私泄露。为应对上述挑战,本文旨在设计一种低延迟分布式推理框架 EdgeCI,将推理任务分配给本地闲置、已连接且资源受限的物联网设备集群网络。EdgeCI 利用了两个关键的优化工具,包括:(1)基于拍卖的工作量分配方案(AWAS),通过将每个工作量分区分配给更匹配的物联网设备来实现工作量平衡;(2)基于非递归动态编程(DPFL)的融合层并行化策略,旨在进一步减少推理时间。我们基于 PyTorch 实现了 EdgeCI,并利用 VGG-16 和 ResNet-34 图像识别模型对其性能进行了评估。实验结果证明,我们提出的 AWAS 和 DPFL 优于最先进的典型解决方案。如果将它们很好地结合起来,EdgeCI 可以将推理速度提高 34.72% 到 43.52%。在测试平台上,EdgeCI 的表现优于最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACM Transactions on Internet Technology
ACM Transactions on Internet Technology 工程技术-计算机:软件工程
CiteScore
10.30
自引率
1.90%
发文量
137
审稿时长
>12 weeks
期刊介绍: ACM Transactions on Internet Technology (TOIT) brings together many computing disciplines including computer software engineering, computer programming languages, middleware, database management, security, knowledge discovery and data mining, networking and distributed systems, communications, performance and scalability etc. TOIT will cover the results and roles of the individual disciplines and the relationshipsamong them.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信