Ubaid Ullah Hafeez, Xiao Sun, Anshul Gandhi, Zhenhua Liu
{"title":"Towards optimal placement and scheduling of DNN operations with Pesto","authors":"Ubaid Ullah Hafeez, Xiao Sun, Anshul Gandhi, Zhenhua Liu","doi":"10.1145/3464298.3476132","DOIUrl":null,"url":null,"abstract":"The increasing size of Deep Neural Networks (DNNs) has necessitated the use of multiple GPUs to host a single DNN model, a practice commonly referred to as model parallelism. The key challenge for model parallelism is to efficiently and effectively partition the DNN model across GPUs to avoid communication overheads while maximizing the GPU utilization, with the end-goal of minimizing the training time of DNN models. Existing approaches either take a long time(hours or even days) to find an effective partition or settle for sub-optimal partitioning, invariably increasing the end-to-end training effort. In this paper, we design and implement Pesto, a fast and near-optimal model placement technique for automatically partitioning arbitrary DNNs across multiple GPUs. The key idea in Pesto is to jointly optimize the model placement and scheduling at the fine-grained operation level to minimize inter-GPU communication while maximizing the opportunity to parallelize the model across GPUs. By carefully formulating the problem as an integer program, Pesto can provide the optimal placement and scheduling. We implement Pesto in TensorFlow and show that Pesto can reduce model training time by up to 31% compared to state-of-the-art approaches, across several large DNN models.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd International Middleware Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3464298.3476132","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
The increasing size of Deep Neural Networks (DNNs) has necessitated the use of multiple GPUs to host a single DNN model, a practice commonly referred to as model parallelism. The key challenge for model parallelism is to efficiently and effectively partition the DNN model across GPUs to avoid communication overheads while maximizing the GPU utilization, with the end-goal of minimizing the training time of DNN models. Existing approaches either take a long time(hours or even days) to find an effective partition or settle for sub-optimal partitioning, invariably increasing the end-to-end training effort. In this paper, we design and implement Pesto, a fast and near-optimal model placement technique for automatically partitioning arbitrary DNNs across multiple GPUs. The key idea in Pesto is to jointly optimize the model placement and scheduling at the fine-grained operation level to minimize inter-GPU communication while maximizing the opportunity to parallelize the model across GPUs. By carefully formulating the problem as an integer program, Pesto can provide the optimal placement and scheduling. We implement Pesto in TensorFlow and show that Pesto can reduce model training time by up to 31% compared to state-of-the-art approaches, across several large DNN models.