拥抱:加速稀疏通信用于深度神经网络的分布式训练

Shengwei Li, Zhiquan Lai, Dongsheng Li, Xiangyu Ye, Yabo Duan
{"title":"拥抱:加速稀疏通信用于深度神经网络的分布式训练","authors":"Shengwei Li, Zhiquan Lai, Dongsheng Li, Xiangyu Ye, Yabo Duan","doi":"10.1145/3545008.3545011","DOIUrl":null,"url":null,"abstract":"Distributed data-parallel training has been widely adopted for deep neural network (DNN) models. Although current deep learning (DL) frameworks scale well for dense models like image classification models, we find that these DL frameworks have relatively low scalability for sparse models like natural language processing (NLP) models that have highly sparse embedding tables. Most existing works overlook the sparsity of model parameters thus suffering from significant but unnecessary communication overhead. In this paper, we propose EmbRace, an efficient communication framework to accelerate communications of distributed training for sparse models. EmbRace introduces Sparsity-aware Hybrid Communication, which integrates AlltoAll and model parallelism into data-parallel training, so as to reduce the communication overhead of highly sparse parameters. To effectively overlap sparse communication with both backward and forward computation, EmbRace further designs a 2D Communication Scheduling approach which optimizes the model computation procedure, relaxes the dependency of embeddings, and schedules the sparse communications of each embedding row with a priority queue. We have implemented a prototype of EmbRace based on PyTorch and Horovod, and conducted comprehensive evaluations with four representative NLP models. Experimental results show that EmbRace achieves up to 2.41 × speedup compared to the state-of-the-art distributed training baselines.","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"EmbRace: Accelerating Sparse Communication for Distributed Training of Deep Neural Networks\",\"authors\":\"Shengwei Li, Zhiquan Lai, Dongsheng Li, Xiangyu Ye, Yabo Duan\",\"doi\":\"10.1145/3545008.3545011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distributed data-parallel training has been widely adopted for deep neural network (DNN) models. Although current deep learning (DL) frameworks scale well for dense models like image classification models, we find that these DL frameworks have relatively low scalability for sparse models like natural language processing (NLP) models that have highly sparse embedding tables. Most existing works overlook the sparsity of model parameters thus suffering from significant but unnecessary communication overhead. In this paper, we propose EmbRace, an efficient communication framework to accelerate communications of distributed training for sparse models. EmbRace introduces Sparsity-aware Hybrid Communication, which integrates AlltoAll and model parallelism into data-parallel training, so as to reduce the communication overhead of highly sparse parameters. To effectively overlap sparse communication with both backward and forward computation, EmbRace further designs a 2D Communication Scheduling approach which optimizes the model computation procedure, relaxes the dependency of embeddings, and schedules the sparse communications of each embedding row with a priority queue. We have implemented a prototype of EmbRace based on PyTorch and Horovod, and conducted comprehensive evaluations with four representative NLP models. Experimental results show that EmbRace achieves up to 2.41 × speedup compared to the state-of-the-art distributed training baselines.\",\"PeriodicalId\":360504,\"journal\":{\"name\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"volume\":\"74 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3545008.3545011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

分布式数据并行训练已被广泛应用于深度神经网络(DNN)模型。尽管目前的深度学习(DL)框架对于图像分类模型等密集模型的可扩展性很好,但我们发现这些深度学习框架对于具有高度稀疏嵌入表的自然语言处理(NLP)模型等稀疏模型的可扩展性相对较低。大多数现有的工作忽略了模型参数的稀疏性,因此遭受了显著但不必要的通信开销。在本文中,我们提出了一个有效的通信框架EmbRace来加速稀疏模型分布式训练的通信。EmbRace引入了稀疏感知混合通信(Sparsity-aware Hybrid Communication),将AlltoAll和模型并行性集成到数据并行训练中,以减少高度稀疏参数的通信开销。为了有效地将稀疏通信与前向和后向计算进行重叠,EmbRace进一步设计了一种二维通信调度方法,该方法优化了模型计算过程,放松了嵌入的依赖性,用优先级队列调度每个嵌入行的稀疏通信。我们基于PyTorch和Horovod实现了一个EmbRace的原型,并使用四个有代表性的NLP模型进行了综合评估。实验结果表明,与目前最先进的分布式训练基线相比,EmbRace实现了高达2.41倍的加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
EmbRace: Accelerating Sparse Communication for Distributed Training of Deep Neural Networks
Distributed data-parallel training has been widely adopted for deep neural network (DNN) models. Although current deep learning (DL) frameworks scale well for dense models like image classification models, we find that these DL frameworks have relatively low scalability for sparse models like natural language processing (NLP) models that have highly sparse embedding tables. Most existing works overlook the sparsity of model parameters thus suffering from significant but unnecessary communication overhead. In this paper, we propose EmbRace, an efficient communication framework to accelerate communications of distributed training for sparse models. EmbRace introduces Sparsity-aware Hybrid Communication, which integrates AlltoAll and model parallelism into data-parallel training, so as to reduce the communication overhead of highly sparse parameters. To effectively overlap sparse communication with both backward and forward computation, EmbRace further designs a 2D Communication Scheduling approach which optimizes the model computation procedure, relaxes the dependency of embeddings, and schedules the sparse communications of each embedding row with a priority queue. We have implemented a prototype of EmbRace based on PyTorch and Horovod, and conducted comprehensive evaluations with four representative NLP models. Experimental results show that EmbRace achieves up to 2.41 × speedup compared to the state-of-the-art distributed training baselines.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信