{"title":"EAGLE: Expedited Device Placement with Automatic Grouping for Large Models","authors":"Hao Lan, Li Chen, Baochun Li","doi":"10.1109/IPDPS49936.2021.00068","DOIUrl":null,"url":null,"abstract":"Advanced deep neural networks with large sizes are usually trained on a mixture of devices, including multiple CPUs and GPUs. The model training speed and efficiency are drastically impacted by the placement of operations on devices. To identify the optimal device placement, the state-of-the-art method is based on reinforcement learning with a hierarchical model, which partitions the operations into groups and then assigns each group to specific devices. However, due to the additional dimension of grouping decisions coupled with the placement, the reinforcement learning efficiency is greatly reduced. With modern neural networks growing in size and complexity, the issue of low efficiency and high cost in device placement is further aggravated. In this paper, we propose our design of EAGLE (Expedited Automatic Grouping for Large modEls), which integrates automatic grouping into reinforcement learning-based placement in an optimal way, to achieve the best possible training time performance for very large models. An extra RNN is introduced to transform parameters of the grouper into inputs of the placer, linking the originally separated parts together. Further optimizations have also been made in the network inputs. We have deployed and extensively evaluated EAGLE on InceptionV3, GNMT and BERT benchmarks. Compared with the state-of-the-art, the performance achieved by our design, measured by the per-step time with the resulted placement, is 2.7% and 18.7% better for GNMT and BERT, respectively. For Inception-V3, our design achieves the fastest speed in discovering the optimal placement.","PeriodicalId":372234,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS49936.2021.00068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Advanced deep neural networks with large sizes are usually trained on a mixture of devices, including multiple CPUs and GPUs. The model training speed and efficiency are drastically impacted by the placement of operations on devices. To identify the optimal device placement, the state-of-the-art method is based on reinforcement learning with a hierarchical model, which partitions the operations into groups and then assigns each group to specific devices. However, due to the additional dimension of grouping decisions coupled with the placement, the reinforcement learning efficiency is greatly reduced. With modern neural networks growing in size and complexity, the issue of low efficiency and high cost in device placement is further aggravated. In this paper, we propose our design of EAGLE (Expedited Automatic Grouping for Large modEls), which integrates automatic grouping into reinforcement learning-based placement in an optimal way, to achieve the best possible training time performance for very large models. An extra RNN is introduced to transform parameters of the grouper into inputs of the placer, linking the originally separated parts together. Further optimizations have also been made in the network inputs. We have deployed and extensively evaluated EAGLE on InceptionV3, GNMT and BERT benchmarks. Compared with the state-of-the-art, the performance achieved by our design, measured by the per-step time with the resulted placement, is 2.7% and 18.7% better for GNMT and BERT, respectively. For Inception-V3, our design achieves the fastest speed in discovering the optimal placement.
大型高级深度神经网络通常在混合设备上进行训练,包括多个cpu和gpu。模型训练的速度和效率受到设备上操作位置的极大影响。为了确定最佳的设备放置位置,最先进的方法是基于分层模型的强化学习,该模型将操作划分为组,然后将每组分配给特定的设备。然而,由于分组决策的额外维度加上位置,大大降低了强化学习的效率。随着现代神经网络规模和复杂性的不断增长,器件放置的低效率和高成本问题进一步加剧。在本文中,我们提出了我们的设计EAGLE (Expedited Automatic Grouping for Large modEls),它以最优的方式将自动分组集成到基于强化学习的放置中,以实现超大模型的最佳训练时间性能。引入额外的RNN将石斑鱼的参数转换为砂矿机的输入,将原来分离的部分连接在一起。在网络输入方面也进行了进一步的优化。我们已经在InceptionV3、GNMT和BERT基准上部署并广泛评估了EAGLE。与最先进的技术相比,我们的设计所实现的性能(通过结果放置的每步时间来衡量)在GNMT和BERT中分别提高了2.7%和18.7%。对于Inception-V3,我们的设计实现了发现最佳位置的最快速度。