Benchmarking network fabrics for data distributed training of deep neural networks

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-08-18 DOI:10.1109/HPEC43674.2020.9286232

S. Samsi, Andrew Prout, Michael Jones, Andrew Kirby, Bill Arcand, Bill Bergeron, David Bestor, C. Byun, V. Gadepally, Michael Houle, M. Hubbell, Anna Klein, P. Michaleas, Lauren Milechin, J. Mullen, Antonio Rosa, Charles Yee, A. Reuther, J. Kepner

{"title":"Benchmarking network fabrics for data distributed training of deep neural networks","authors":"S. Samsi, Andrew Prout, Michael Jones, Andrew Kirby, Bill Arcand, Bill Bergeron, David Bestor, C. Byun, V. Gadepally, Michael Houle, M. Hubbell, Anna Klein, P. Michaleas, Lauren Milechin, J. Mullen, Antonio Rosa, Charles Yee, A. Reuther, J. Kepner","doi":"10.1109/HPEC43674.2020.9286232","DOIUrl":null,"url":null,"abstract":"Artificial Intelligence/Machine Learning applications require the training of complex models on large amounts of labelled data. The large computational requirements for training deep models have necessitated the development of new methods for faster training. One such approach is the data parallel approach, where the training data is distributed across multiple compute nodes. This approach is simple to implement and supported by most of the commonly used machine learning frameworks. The data parallel approach leverages MPI for communicating gradients across all nodes. In this paper, we examine the effects of using different physical hardware interconnects and network-related software primitives for enabling data distributed deep learning. We compare the effect of using GPUDirect and NCCL on Ethernet and OmniPath fabrics. Our results show that using Ethernet-based networking in shared HPC systems does not have a significant effect on the training times for commonly used deep neural network architectures or traditional HPC applications such as Computational Fluid Dynamics.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC43674.2020.9286232","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Artificial Intelligence/Machine Learning applications require the training of complex models on large amounts of labelled data. The large computational requirements for training deep models have necessitated the development of new methods for faster training. One such approach is the data parallel approach, where the training data is distributed across multiple compute nodes. This approach is simple to implement and supported by most of the commonly used machine learning frameworks. The data parallel approach leverages MPI for communicating gradients across all nodes. In this paper, we examine the effects of using different physical hardware interconnects and network-related software primitives for enabling data distributed deep learning. We compare the effect of using GPUDirect and NCCL on Ethernet and OmniPath fabrics. Our results show that using Ethernet-based networking in shared HPC systems does not have a significant effect on the training times for commonly used deep neural network architectures or traditional HPC applications such as Computational Fluid Dynamics.

查看原文本刊更多论文

面向深度神经网络数据分布式训练的基准网络结构

人工智能/机器学习应用需要在大量标记数据上训练复杂模型。训练深度模型需要大量的计算量，因此有必要开发新的方法来实现更快的训练。其中一种方法是数据并行方法，其中训练数据分布在多个计算节点上。这种方法易于实现，并且得到大多数常用机器学习框架的支持。数据并行方法利用MPI在所有节点之间通信梯度。在本文中，我们研究了使用不同的物理硬件互连和网络相关的软件原语来实现数据分布式深度学习的效果。我们比较了使用GPUDirect和NCCL在以太网和OmniPath fabric上的效果。我们的研究结果表明，在共享HPC系统中使用基于以太网的网络对常用的深度神经网络架构或传统的HPC应用(如计算流体动力学)的训练时间没有显著影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量