管道模型并行的内存感知动态规划算法

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI:10.1109/IPDPSW55747.2022.00174

Olivier Beaumont, Lionel Eyraud-Dubois, Alena Shilova

{"title":"管道模型并行的内存感知动态规划算法","authors":"Olivier Beaumont, Lionel Eyraud-Dubois, Alena Shilova","doi":"10.1109/IPDPSW55747.2022.00174","DOIUrl":null,"url":null,"abstract":"The training phase in Deep Neural Networks (DNNs) is very computationally intensive and is nowadays often performed on parallel computing platforms, ranging from a few GPUs to several thousand GPUs. The strategy of choice for the parallelization of training is the so-called data parallel approach, based on the parallel training of the different inputs (typically images) and the aggregation of network weights with collective communications (AllReduce operation). The scalability of this approach is limited both by the memory available on each node and the networking capacities for collective operations. Recently, a parallel model approach has been proposed (PipeDream, Gpipe), in which the DNN weights are distributed and images are trained in a pipeline/stream manner over the computational nodes. In this paper, we formalize in detail the optimization problem associated with the placement of DNN layers onto computation resources when using pipelined model parallelism, and we derive a dynamic programming based heuristic, MadPipe. We show through extensive simulations based on realistic networks that MadPipe significantly improves the performance of the pipelined parallel model approach compared to PipeDream.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"MadPipe: Memory Aware Dynamic Programming Algorithm for Pipelined Model Parallelism\",\"authors\":\"Olivier Beaumont, Lionel Eyraud-Dubois, Alena Shilova\",\"doi\":\"10.1109/IPDPSW55747.2022.00174\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The training phase in Deep Neural Networks (DNNs) is very computationally intensive and is nowadays often performed on parallel computing platforms, ranging from a few GPUs to several thousand GPUs. The strategy of choice for the parallelization of training is the so-called data parallel approach, based on the parallel training of the different inputs (typically images) and the aggregation of network weights with collective communications (AllReduce operation). The scalability of this approach is limited both by the memory available on each node and the networking capacities for collective operations. Recently, a parallel model approach has been proposed (PipeDream, Gpipe), in which the DNN weights are distributed and images are trained in a pipeline/stream manner over the computational nodes. In this paper, we formalize in detail the optimization problem associated with the placement of DNN layers onto computation resources when using pipelined model parallelism, and we derive a dynamic programming based heuristic, MadPipe. We show through extensive simulations based on realistic networks that MadPipe significantly improves the performance of the pipelined parallel model approach compared to PipeDream.\",\"PeriodicalId\":286968,\"journal\":{\"name\":\"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW55747.2022.00174\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW55747.2022.00174","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

深度神经网络(dnn)的训练阶段是非常计算密集的，现在经常在并行计算平台上执行，从几个gpu到几千个gpu。训练并行化的选择策略是所谓的数据并行方法，该方法基于不同输入(通常是图像)的并行训练和具有集体通信(AllReduce操作)的网络权重聚合。这种方法的可伸缩性受到每个节点上可用内存和集体操作的网络容量的限制。最近，提出了一种并行模型方法(PipeDream, Gpipe)，其中DNN权重分布，图像以管道/流的方式在计算节点上进行训练。在本文中，我们详细形式化了当使用管道模型并行时与DNN层放置到计算资源相关的优化问题，并且我们推导了一个基于动态规划的启发式算法MadPipe。我们通过基于现实网络的广泛模拟表明，与PipeDream相比，MadPipe显着提高了流水线并行模型方法的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MadPipe: Memory Aware Dynamic Programming Algorithm for Pipelined Model Parallelism

The training phase in Deep Neural Networks (DNNs) is very computationally intensive and is nowadays often performed on parallel computing platforms, ranging from a few GPUs to several thousand GPUs. The strategy of choice for the parallelization of training is the so-called data parallel approach, based on the parallel training of the different inputs (typically images) and the aggregation of network weights with collective communications (AllReduce operation). The scalability of this approach is limited both by the memory available on each node and the networking capacities for collective operations. Recently, a parallel model approach has been proposed (PipeDream, Gpipe), in which the DNN weights are distributed and images are trained in a pipeline/stream manner over the computational nodes. In this paper, we formalize in detail the optimization problem associated with the placement of DNN layers onto computation resources when using pipelined model parallelism, and we derive a dynamic programming based heuristic, MadPipe. We show through extensive simulations based on realistic networks that MadPipe significantly improves the performance of the pipelined parallel model approach compared to PipeDream.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量