Optimizing CNN inference speed over big social data through efficient model parallelism for sustainable web of things

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing Pub Date : 2024-05-31 DOI:10.1016/j.jpdc.2024.104927

Yuhao Hu , Xiaolong Xu , Muhammad Bilal , Weiyi Zhong , Yuwen Liu , Huaizhen Kou , Lingzhen Kong

{"title":"Optimizing CNN inference speed over big social data through efficient model parallelism for sustainable web of things","authors":"Yuhao Hu , Xiaolong Xu , Muhammad Bilal , Weiyi Zhong , Yuwen Liu , Huaizhen Kou , Lingzhen Kong","doi":"10.1016/j.jpdc.2024.104927","DOIUrl":null,"url":null,"abstract":"<div><p>The rapid development of artificial intelligence and networking technologies has catalyzed the popularity of intelligent services based on deep learning in recent years, which in turn fosters the advancement of Web of Things (WoT). Big social data (BSD) plays an important role during the processing of intelligent services in WoT. However, intelligent BSD services are computationally intensive and require ultra-low latency. End or edge devices with limited computing power cannot realize the extremely low response latency of those services. Distributed inference of deep neural networks (DNNs) on various devices is considered a feasible solution by allocating the computing load of a DNN to several devices. In this work, an efficient model parallelism method that couples convolution layer (Conv) split with resource allocation is proposed. First, given a random computing resource allocation strategy, the Conv split decision is made through a mathematical analysis method to realize the parallel inference of convolutional neural networks (CNNs). Next, Deep Reinforcement Learning is used to get the optimal computing resource allocation strategy to maximize the resource utilization rate and minimize the CNN inference latency. Finally, simulation results show that our approach performs better than the baselines and is applicable for BSD services in WoT with a high workload.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"192 ","pages":"Article 104927"},"PeriodicalIF":3.4000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0743731524000911","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

The rapid development of artificial intelligence and networking technologies has catalyzed the popularity of intelligent services based on deep learning in recent years, which in turn fosters the advancement of Web of Things (WoT). Big social data (BSD) plays an important role during the processing of intelligent services in WoT. However, intelligent BSD services are computationally intensive and require ultra-low latency. End or edge devices with limited computing power cannot realize the extremely low response latency of those services. Distributed inference of deep neural networks (DNNs) on various devices is considered a feasible solution by allocating the computing load of a DNN to several devices. In this work, an efficient model parallelism method that couples convolution layer (Conv) split with resource allocation is proposed. First, given a random computing resource allocation strategy, the Conv split decision is made through a mathematical analysis method to realize the parallel inference of convolutional neural networks (CNNs). Next, Deep Reinforcement Learning is used to get the optimal computing resource allocation strategy to maximize the resource utilization rate and minimize the CNN inference latency. Finally, simulation results show that our approach performs better than the baselines and is applicable for BSD services in WoT with a high workload.

查看原文本刊更多论文

通过高效模型并行化优化 CNN 对社交大数据的推理速度，实现可持续物联网

近年来，人工智能和网络技术的快速发展推动了基于深度学习的智能服务的普及，进而促进了物联网（WoT）的发展。大社会数据（BSD）在物联网智能服务的处理过程中发挥着重要作用。然而，智能 BSD 服务是计算密集型的，需要超低延迟。计算能力有限的终端或边缘设备无法实现这些服务的超低响应延迟。通过将深度神经网络（DNN）的计算负载分配给多个设备，在不同设备上进行分布式推理被认为是一种可行的解决方案。在这项工作中，提出了一种将卷积层（Conv）拆分与资源分配相结合的高效模型并行方法。首先，给定随机计算资源分配策略，通过数学分析方法做出 Conv 分割决策，实现卷积神经网络（CNN）的并行推理。接着，利用深度强化学习（Deep Reinforcement Learning）获得最优计算资源分配策略，从而最大化资源利用率，最小化 CNN 推理延迟。最后，仿真结果表明，我们的方法比基线方法性能更好，适用于高工作量的 WoT 中的 BSD 服务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Parallel and Distributed Computing 工程技术-计算机：理论方法

CiteScore

10.30

自引率

2.60%

发文量

172

审稿时长

12 months

期刊介绍： This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing. The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.