DxPU: Large Scale Disaggregated GPU Pools in the Datacenter

IF 1.8 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Architecture and Code Optimization Pub Date : 2023-10-05 DOI:10.1145/3617995

Bowen He, Xiao Zheng, Yuan Chen, Weinan Li, Yajin Zhou, Xin Long, Pengcheng Zhang, Xiaowei Lu, Linquan Jiang, Qiang Liu, Dennis Cai, Xiantao Zhang

{"title":"DxPU: Large Scale Disaggregated GPU Pools in the Datacenter","authors":"Bowen He, Xiao Zheng, Yuan Chen, Weinan Li, Yajin Zhou, Xin Long, Pengcheng Zhang, Xiaowei Lu, Linquan Jiang, Qiang Liu, Dennis Cai, Xiantao Zhang","doi":"10.1145/3617995","DOIUrl":null,"url":null,"abstract":"The rapid adoption of AI and convenience offered by cloud services have resulted in the growing demands for GPUs in the cloud. Generally, GPUs are physically attached to host servers as PCIe devices. However, the fixed assembly combination of host servers and GPUs is extremely inefficient in resource utilization, upgrade, and maintenance. Due to these issues, the GPU disaggregation technique has been proposed to decouple GPUs from host servers. It aggregates GPUs into a pool, and allocates GPU node(s) according to user demands. However, existing GPU disaggregation systems have flaws in software-hardware compatibility, disaggregation scope, and capacity. In this paper, we present a new implementation of datacenter-scale GPU disaggregation, named DxPU. DxPU efficiently solves the above problems and can flexibly allocate as many GPU node(s) as users demand. In order to understand the performance overhead incurred by DxPU, we build up a performance model for AI specific workloads. With the guidance of modeling results, we develop a prototype system, which has been deployed into the datacenter of a leading cloud provider for a test run. We also conduct detailed experiments to evaluate the performance overhead caused by our system. The results show that the overhead of DxPU is less than 10%, compared with native GPU servers, in most of user scenarios.","PeriodicalId":50920,"journal":{"name":"ACM Transactions on Architecture and Code Optimization","volume":"36 1","pages":"0"},"PeriodicalIF":1.8000,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Architecture and Code Optimization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3617995","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

The rapid adoption of AI and convenience offered by cloud services have resulted in the growing demands for GPUs in the cloud. Generally, GPUs are physically attached to host servers as PCIe devices. However, the fixed assembly combination of host servers and GPUs is extremely inefficient in resource utilization, upgrade, and maintenance. Due to these issues, the GPU disaggregation technique has been proposed to decouple GPUs from host servers. It aggregates GPUs into a pool, and allocates GPU node(s) according to user demands. However, existing GPU disaggregation systems have flaws in software-hardware compatibility, disaggregation scope, and capacity. In this paper, we present a new implementation of datacenter-scale GPU disaggregation, named DxPU. DxPU efficiently solves the above problems and can flexibly allocate as many GPU node(s) as users demand. In order to understand the performance overhead incurred by DxPU, we build up a performance model for AI specific workloads. With the guidance of modeling results, we develop a prototype system, which has been deployed into the datacenter of a leading cloud provider for a test run. We also conduct detailed experiments to evaluate the performance overhead caused by our system. The results show that the overhead of DxPU is less than 10%, compared with native GPU servers, in most of user scenarios.

查看原文本刊更多论文

DxPU:数据中心的大规模解聚GPU池

人工智能的快速采用和云服务提供的便利性导致了对云中的gpu的需求不断增长。一般情况下，gpu作为PCIe设备物理绑定到主机服务器上。但是，主机服务器和gpu的固定组装组合在资源利用、升级和维护方面效率极低。针对这些问题，提出了GPU解聚技术，将GPU与主机服务器解耦。它将GPU聚合成一个池，并根据用户需求分配GPU节点。但是现有的GPU解聚系统在软硬件兼容性、解聚范围、容量等方面存在缺陷。在本文中，我们提出了一种新的数据中心规模的GPU分解实现，称为DxPU。DxPU有效地解决了上述问题，并可根据用户需求灵活分配任意数量的GPU节点。为了理解DxPU带来的性能开销，我们为AI特定工作负载建立了一个性能模型。在建模结果的指导下，我们开发了一个原型系统，该系统已部署到一家领先的云提供商的数据中心进行测试运行。我们还进行了详细的实验来评估系统造成的性能开销。结果表明，在大多数用户场景下，与本地GPU服务器相比，DxPU的开销小于10%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Architecture and Code Optimization 工程技术-计算机：理论方法

CiteScore

3.60

自引率

6.20%

发文量

审稿时长

6-12 weeks

期刊介绍： ACM Transactions on Architecture and Code Optimization (TACO) focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. Articles that appear in TACO will either present new techniques and concepts or report on experiences and experiments with actual systems. Insights useful to architects, hardware or software developers, designers, builders, and users will be emphasized.