Efficient All-Reduce for Distributed DNN Training in Optical Interconnect Systems

Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming Pub Date : 2022-07-22 DOI:10.1145/3572848.3577391

Fei Dai, Yawen Chen, Zhiyi Huang, Haibo Zhang, Fangfang Zhang

{"title":"Efficient All-Reduce for Distributed DNN Training in Optical Interconnect Systems","authors":"Fei Dai, Yawen Chen, Zhiyi Huang, Haibo Zhang, Fangfang Zhang","doi":"10.1145/3572848.3577391","DOIUrl":null,"url":null,"abstract":"All-reduce is the crucial communication primitive to reduce model parameters in distributed Deep Neural Networks (DNN) training. Most existing all-reduce algorithms are designed for traditional electrical interconnect systems, which cannot meet the communication requirements for distributed training of large DNNs due to the low data bandwidth of the electrical interconnect systems. One of the promising alternatives for electrical interconnect is optical interconnect, which can provide high bandwidth, low transmission delay, and low power cost. We propose an efficient scheme called WRHT (Wavelength Reused Hierarchical Tree) for implementing all-reduce operation in optical interconnect systems. WRHT can take advantage of WDM (Wavelength Division Multiplexing) to reduce the communication time of distributed data-parallel DNN training. Simulations using real DNN models show that, compared to all-reduce algorithms in the electrical and optical network systems, our approach reduces communication time by 75.76% and 91.86%, respectively.","PeriodicalId":233744,"journal":{"name":"Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3572848.3577391","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

All-reduce is the crucial communication primitive to reduce model parameters in distributed Deep Neural Networks (DNN) training. Most existing all-reduce algorithms are designed for traditional electrical interconnect systems, which cannot meet the communication requirements for distributed training of large DNNs due to the low data bandwidth of the electrical interconnect systems. One of the promising alternatives for electrical interconnect is optical interconnect, which can provide high bandwidth, low transmission delay, and low power cost. We propose an efficient scheme called WRHT (Wavelength Reused Hierarchical Tree) for implementing all-reduce operation in optical interconnect systems. WRHT can take advantage of WDM (Wavelength Division Multiplexing) to reduce the communication time of distributed data-parallel DNN training. Simulations using real DNN models show that, compared to all-reduce algorithms in the electrical and optical network systems, our approach reduces communication time by 75.76% and 91.86%, respectively.

查看原文本刊更多论文

光学互连系统分布式DNN训练的高效全约简

All-reduce是分布式深度神经网络(DNN)训练中实现模型参数约简的关键通信原语。现有的全约简算法大多是针对传统的电气互连系统设计的，由于电气互连系统的数据带宽较低，无法满足大型dnn分布式训练的通信要求。光互连具有高带宽、低传输延迟和低功耗等优点，是电互连的一种很有前途的替代方案。我们提出了一种有效的实现光互连系统全约简操作的WRHT (Wavelength reuse Hierarchical Tree)方案。WRHT可以利用WDM(波分复用)来减少分布式数据并行DNN训练的通信时间。使用真实DNN模型进行的仿真表明，与电气和光学网络系统中的全约简算法相比，我们的方法分别减少了75.76%和91.86%的通信时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

自引率

0.00%

发文量