A Service Management Method for Distributed Deep Learning

2021 International Conference on Information and Communication Technology Convergence (ICTC) Pub Date : 2021-10-20 DOI:10.1109/ICTC52510.2021.9621013

Seungwoo Kum, Seungtaek Oh, Jaewon Moon

{"title":"A Service Management Method for Distributed Deep Learning","authors":"Seungwoo Kum, Seungtaek Oh, Jaewon Moon","doi":"10.1109/ICTC52510.2021.9621013","DOIUrl":null,"url":null,"abstract":"With the advance of deep learning technologies, many applications and/or services that rely on them can be easily found these days. Applications relying on deep learning varies from video, audio, text and time-series data, and they provide high-accuracy services that are built with the software platforms such as TensorFlow or PyTorch. Usually, a deep learning service requires rich resources such as GPU and large memory. For instance, GPT-3 requires memories up to a few hundred gigabytes, and for the video processing it needs accelerators such as GPU. The cost will be increased if all the resources are on the cloud, and there are many works on offloading these workloads of deep learning onto distributed infrastructure. One of the focuses of these works are distribution of deep learning workloads onto various resource and providing an end-to-end service by the combination of them. Edge computing or Fog computing is one of the architectures providing workload distribution method from cloud to edge resources. This paper proposes a method that enables autonomous configuration between distributed services. In the proposed method, the composition of distributed services is described in systematic way so to configure the connections between them more intuitively. Further, the proposed method includes binding of a resource to a service which enables management of multiple service distributions, and how it can work with existing standards.","PeriodicalId":299175,"journal":{"name":"2021 International Conference on Information and Communication Technology Convergence (ICTC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information and Communication Technology Convergence (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTC52510.2021.9621013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

With the advance of deep learning technologies, many applications and/or services that rely on them can be easily found these days. Applications relying on deep learning varies from video, audio, text and time-series data, and they provide high-accuracy services that are built with the software platforms such as TensorFlow or PyTorch. Usually, a deep learning service requires rich resources such as GPU and large memory. For instance, GPT-3 requires memories up to a few hundred gigabytes, and for the video processing it needs accelerators such as GPU. The cost will be increased if all the resources are on the cloud, and there are many works on offloading these workloads of deep learning onto distributed infrastructure. One of the focuses of these works are distribution of deep learning workloads onto various resource and providing an end-to-end service by the combination of them. Edge computing or Fog computing is one of the architectures providing workload distribution method from cloud to edge resources. This paper proposes a method that enables autonomous configuration between distributed services. In the proposed method, the composition of distributed services is described in systematic way so to configure the connections between them more intuitively. Further, the proposed method includes binding of a resource to a service which enables management of multiple service distributions, and how it can work with existing standards.

查看原文本刊更多论文

一种分布式深度学习服务管理方法

随着深度学习技术的进步，现在可以很容易地找到许多依赖于它们的应用程序和/或服务。依赖深度学习的应用程序包括视频、音频、文本和时间序列数据，它们提供使用TensorFlow或PyTorch等软件平台构建的高精度服务。通常，深度学习服务需要丰富的资源，如GPU和大内存。例如，GPT-3需要高达几百gb的内存，对于视频处理，它需要GPU等加速器。如果所有的资源都在云上，那么成本将会增加，并且将这些深度学习的工作负载卸载到分布式基础设施上有很多工作要做。这些工作的重点之一是将深度学习工作负载分配到各种资源上，并通过它们的组合提供端到端服务。边缘计算或雾计算是提供从云到边缘资源的工作负载分配方法的体系结构之一。本文提出了一种实现分布式服务之间自主配置的方法。该方法系统地描述了分布式服务的组合，从而更直观地配置了分布式服务之间的连接。此外，建议的方法还包括将资源绑定到服务，从而支持对多个服务分布的管理，以及它如何与现有标准一起工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Conference on Information and Communication Technology Convergence (ICTC)

自引率

0.00%

发文量