An In-Depth Analysis of Distributed Training of Deep Neural Networks

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2021-05-01 DOI:10.1109/IPDPS49936.2021.00108

Yunyong Ko, Kibong Choi, Jiwon Seo, Sang-Wook Kim

{"title":"An In-Depth Analysis of Distributed Training of Deep Neural Networks","authors":"Yunyong Ko, Kibong Choi, Jiwon Seo, Sang-Wook Kim","doi":"10.1109/IPDPS49936.2021.00108","DOIUrl":null,"url":null,"abstract":"As the popularity of deep learning in industry rapidly grows, efficient training of deep neural networks (DNNs) becomes important. To train a DNN with a large amount of data, distributed training with data parallelism has been widely adopted. However, the communication overhead limits the scalability of distributed training. To reduce the overhead, a number of distributed training algorithms have been proposed. The model accuracy and training performance of those algorithms can be different depending on various factors such as cluster settings, training models/datasets, and optimization techniques applied. In order for someone to adopt a distributed training algorithm appropriate for her/his situation, it is required for her/him to fully understand the model accuracy and training performance of these algorithms in various settings. Toward this end, this paper reviews and evaluates seven popular distributed training algorithms (BSP, ASP, SSP, EASGD, AR-SGD, GoSGD, and AD-PSGD) in terms of the model accuracy and training performance in various settings. Specifically, we evaluate those algorithms for two CNN models, in different cluster settings, and with three well-known optimization techniques. Through extensive evaluation and analysis, we made several interesting discoveries. For example, we found out that some distributed training algorithms (SSP, EASGD, and GoSGD) have highly negative impact on the model accuracy because they adopt intermittent and asymmetric communication to improve training performance; the communication overhead of some centralized algorithms (ASP and SSP) is much higher than we expected in a cluster setting with limited network bandwidth because of the PS bottleneck problem. These findings, and many more in the paper, can guide the adoption of proper distributed training algorithms in industry; our findings can be useful in academia as well for designing new distributed training algorithms.","PeriodicalId":372234,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS49936.2021.00108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

As the popularity of deep learning in industry rapidly grows, efficient training of deep neural networks (DNNs) becomes important. To train a DNN with a large amount of data, distributed training with data parallelism has been widely adopted. However, the communication overhead limits the scalability of distributed training. To reduce the overhead, a number of distributed training algorithms have been proposed. The model accuracy and training performance of those algorithms can be different depending on various factors such as cluster settings, training models/datasets, and optimization techniques applied. In order for someone to adopt a distributed training algorithm appropriate for her/his situation, it is required for her/him to fully understand the model accuracy and training performance of these algorithms in various settings. Toward this end, this paper reviews and evaluates seven popular distributed training algorithms (BSP, ASP, SSP, EASGD, AR-SGD, GoSGD, and AD-PSGD) in terms of the model accuracy and training performance in various settings. Specifically, we evaluate those algorithms for two CNN models, in different cluster settings, and with three well-known optimization techniques. Through extensive evaluation and analysis, we made several interesting discoveries. For example, we found out that some distributed training algorithms (SSP, EASGD, and GoSGD) have highly negative impact on the model accuracy because they adopt intermittent and asymmetric communication to improve training performance; the communication overhead of some centralized algorithms (ASP and SSP) is much higher than we expected in a cluster setting with limited network bandwidth because of the PS bottleneck problem. These findings, and many more in the paper, can guide the adoption of proper distributed training algorithms in industry; our findings can be useful in academia as well for designing new distributed training algorithms.

查看原文本刊更多论文

深度神经网络分布式训练的深入分析

随着深度学习在工业领域的迅速普及，深度神经网络(dnn)的高效训练变得越来越重要。为了训练具有大量数据的深度神经网络，采用数据并行性的分布式训练被广泛采用。然而，通信开销限制了分布式训练的可扩展性。为了减少开销，人们提出了许多分布式训练算法。这些算法的模型准确性和训练性能可能会因各种因素而有所不同，例如聚类设置、训练模型/数据集和应用的优化技术。为了采用适合自己情况的分布式训练算法，需要充分了解这些算法在各种设置下的模型精度和训练性能。为此，本文回顾和评估了七种流行的分布式训练算法(BSP、ASP、SSP、EASGD、AR-SGD、GoSGD和AD-PSGD)在不同设置下的模型精度和训练性能。具体来说，我们在不同的聚类设置下对两种CNN模型和三种知名的优化技术进行了评估。通过广泛的评估和分析，我们有了一些有趣的发现。例如，我们发现一些分布式训练算法(SSP、EASGD和GoSGD)由于采用间歇性和非对称通信来提高训练性能，对模型精度有很大的负面影响;由于PS瓶颈问题，在网络带宽有限的集群设置中，一些集中式算法(ASP和SSP)的通信开销比我们预期的要高得多。这些发现，以及论文中更多的发现，可以指导在工业中采用适当的分布式训练算法;我们的发现对于设计新的分布式训练算法在学术界也很有用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量