Nexus: a GPU cluster engine for accelerating DNN-based video analysis

Proceedings of the 27th ACM Symposium on Operating Systems Principles Pub Date : 2019-10-27 DOI:10.1145/3341301.3359658

Haichen Shen, Lequn Chen, Yuchen Jin, Liangyu Zhao, Bingyu Kong, Matthai Philipose, A. Krishnamurthy, Ravi Sundaram

引用次数: 151

Abstract

We address the problem of serving Deep Neural Networks (DNNs) efficiently from a cluster of GPUs. In order to realize the promise of very low-cost processing made by accelerators such as GPUs, it is essential to run them at sustained high utilization. Doing so requires cluster-scale resource management that performs detailed scheduling of GPUs, reasoning about groups of DNN invocations that need to be co-scheduled, and moving from the conventional whole-DNN execution model to executing fragments of DNNs. Nexus is a fully implemented system that includes these innovations. In large-scale case studies on 16 GPUs, when required to stay within latency constraints at least 99% of the time, Nexus can process requests at rates 1.8-12.7X higher than state of the art systems can. A long-running multi-application deployment stays within 84% of optimal utilization and, on a 100-GPU cluster, violates latency SLOs on 0.27% of requests.

查看原文本刊更多论文

Nexus:一个GPU集群引擎，用于加速基于dnn的视频分析

我们解决了从gpu集群有效地服务深度神经网络(dnn)的问题。为了实现像gpu这样的加速器所做的低成本处理的承诺，必须以持续的高利用率运行它们。这样做需要集群规模的资源管理，执行gpu的详细调度，推理需要共同调度的DNN调用组，并从传统的整个DNN执行模型转移到执行DNN片段。Nexus是一个完全实现的系统，包括这些创新。在16个gpu的大规模案例研究中，当需要在至少99%的时间内保持延迟限制时，Nexus处理请求的速度比目前最先进的系统高1.8-12.7倍。长时间运行的多应用程序部署保持在最佳利用率的84%以内，并且在100 gpu集群上，0.27%的请求违反延迟slo。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 27th ACM Symposium on Operating Systems Principles

自引率

0.00%

发文量