Evaluating the Performance of Deep Learning Inference Service on Edge Platform

2021 International Conference on Information and Communication Technology Convergence (ICTC) Pub Date : 2021-10-20 DOI:10.1109/ICTC52510.2021.9620870

Hyun-Hwa Choi, Jae-Geun Cha, Seung Hyun Yun, Dae-Won Kim, Sumin Jang, S. Kim

{"title":"Evaluating the Performance of Deep Learning Inference Service on Edge Platform","authors":"Hyun-Hwa Choi, Jae-Geun Cha, Seung Hyun Yun, Dae-Won Kim, Sumin Jang, S. Kim","doi":"10.1109/ICTC52510.2021.9620870","DOIUrl":null,"url":null,"abstract":"Deep learning inference requires tremendous amount of computation and typically is offloaded the cloud for execution. Recently, edge computing, which processes and stores data at the edge of the Internet closest to the mobile devices or sensors, has been considered as new computing paradigm. We have studied the performance of the deep neural network (DNN) inference service based on different configurations of resources assigned to a container. In this work, we measured and analyzed a real-world edge service on containerization platform. An edge service is named A!Eye, an application with various DNN inferences. The edge service has both CPU-friendly and GPU-friendly tasks. CPU tasks account for more than half of the latency of the edge service. Our analyses reveal interesting findings about running the DNN inference service on the container-based execution platform; (a) The latency of DNN inference-based edge services is affected by CPU-based operation performance. (b) Pinning CPUs can reduce the latency of an edge service. (c) In order to improve the performance of an edge service, it is very important to avoid PCIe bottleneck shared by resources like CPUs, GPUs and NICs.","PeriodicalId":299175,"journal":{"name":"2021 International Conference on Information and Communication Technology Convergence (ICTC)","volume":"20 80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information and Communication Technology Convergence (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTC52510.2021.9620870","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning inference requires tremendous amount of computation and typically is offloaded the cloud for execution. Recently, edge computing, which processes and stores data at the edge of the Internet closest to the mobile devices or sensors, has been considered as new computing paradigm. We have studied the performance of the deep neural network (DNN) inference service based on different configurations of resources assigned to a container. In this work, we measured and analyzed a real-world edge service on containerization platform. An edge service is named A!Eye, an application with various DNN inferences. The edge service has both CPU-friendly and GPU-friendly tasks. CPU tasks account for more than half of the latency of the edge service. Our analyses reveal interesting findings about running the DNN inference service on the container-based execution platform; (a) The latency of DNN inference-based edge services is affected by CPU-based operation performance. (b) Pinning CPUs can reduce the latency of an edge service. (c) In order to improve the performance of an edge service, it is very important to avoid PCIe bottleneck shared by resources like CPUs, GPUs and NICs.

查看原文本刊更多论文

边缘平台上深度学习推理服务的性能评估

深度学习推理需要大量的计算，通常会卸载到云上执行。最近，边缘计算被认为是一种新的计算范式，它是在最靠近移动设备或传感器的互联网边缘处理和存储数据。我们研究了基于分配给容器的不同资源配置的深度神经网络(DNN)推理服务性能。在这项工作中，我们在集装箱化平台上测量和分析了一个真实的边缘服务。边缘服务命名为A!Eye，一个具有各种DNN推理的应用程序。边缘服务同时具有cpu友好型和gpu友好型任务。CPU任务占边缘服务延迟的一半以上。我们的分析揭示了在基于容器的执行平台上运行DNN推理服务的有趣发现;(a)基于DNN推理的边缘服务时延受基于cpu的操作性能影响。(b)固定cpu可以减少边缘服务的延迟。(c)为了提高边缘服务的性能，避免cpu、gpu和网卡等资源共享PCIe瓶颈是非常重要的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Conference on Information and Communication Technology Convergence (ICTC)

自引率

0.00%

发文量