Towards optimal placement and scheduling of DNN operations with Pesto

Proceedings of the 22nd International Middleware Conference Pub Date : 2021-10-02 DOI:10.1145/3464298.3476132

Ubaid Ullah Hafeez, Xiao Sun, Anshul Gandhi, Zhenhua Liu

{"title":"Towards optimal placement and scheduling of DNN operations with Pesto","authors":"Ubaid Ullah Hafeez, Xiao Sun, Anshul Gandhi, Zhenhua Liu","doi":"10.1145/3464298.3476132","DOIUrl":null,"url":null,"abstract":"The increasing size of Deep Neural Networks (DNNs) has necessitated the use of multiple GPUs to host a single DNN model, a practice commonly referred to as model parallelism. The key challenge for model parallelism is to efficiently and effectively partition the DNN model across GPUs to avoid communication overheads while maximizing the GPU utilization, with the end-goal of minimizing the training time of DNN models. Existing approaches either take a long time(hours or even days) to find an effective partition or settle for sub-optimal partitioning, invariably increasing the end-to-end training effort. In this paper, we design and implement Pesto, a fast and near-optimal model placement technique for automatically partitioning arbitrary DNNs across multiple GPUs. The key idea in Pesto is to jointly optimize the model placement and scheduling at the fine-grained operation level to minimize inter-GPU communication while maximizing the opportunity to parallelize the model across GPUs. By carefully formulating the problem as an integer program, Pesto can provide the optimal placement and scheduling. We implement Pesto in TensorFlow and show that Pesto can reduce model training time by up to 31% compared to state-of-the-art approaches, across several large DNN models.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd International Middleware Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3464298.3476132","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

The increasing size of Deep Neural Networks (DNNs) has necessitated the use of multiple GPUs to host a single DNN model, a practice commonly referred to as model parallelism. The key challenge for model parallelism is to efficiently and effectively partition the DNN model across GPUs to avoid communication overheads while maximizing the GPU utilization, with the end-goal of minimizing the training time of DNN models. Existing approaches either take a long time(hours or even days) to find an effective partition or settle for sub-optimal partitioning, invariably increasing the end-to-end training effort. In this paper, we design and implement Pesto, a fast and near-optimal model placement technique for automatically partitioning arbitrary DNNs across multiple GPUs. The key idea in Pesto is to jointly optimize the model placement and scheduling at the fine-grained operation level to minimize inter-GPU communication while maximizing the opportunity to parallelize the model across GPUs. By carefully formulating the problem as an integer program, Pesto can provide the optimal placement and scheduling. We implement Pesto in TensorFlow and show that Pesto can reduce model training time by up to 31% compared to state-of-the-art approaches, across several large DNN models.

查看原文本刊更多论文

用Pesto实现DNN操作的最优布局和调度

深度神经网络(DNN)的规模越来越大，需要使用多个gpu来托管单个DNN模型，这种做法通常被称为模型并行。模型并行化的关键挑战是如何高效、有效地跨GPU划分DNN模型，以避免通信开销，同时最大限度地提高GPU利用率，最终目标是最小化DNN模型的训练时间。现有的方法要么花费很长时间(几个小时甚至几天)来找到一个有效的划分，要么满足于次优划分，总是增加端到端的训练工作。在本文中，我们设计并实现了Pesto，一种快速且接近最优的模型放置技术，用于在多个gpu上自动划分任意dnn。Pesto的关键思想是在细粒度操作级别上共同优化模型放置和调度，以最小化gpu间的通信，同时最大化跨gpu并行化模型的机会。通过将问题仔细地表述为一个整数程序，Pesto可以提供最优的布局和调度。我们在TensorFlow中实现了Pesto，并表明在几个大型DNN模型中，与最先进的方法相比，Pesto可以将模型训练时间减少多达31%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 22nd International Middleware Conference

自引率

0.00%

发文量