GRACE: A Compressed Communication Framework for Distributed Machine Learning

2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS) Pub Date : 2021-07-01 DOI:10.1109/ICDCS51616.2021.00060

Hang Xu, Chen-Yu Ho, A. Abdelmoniem, Aritra Dutta, E. Bergou, Konstantinos Karatsenidis, M. Canini, Panos Kalnis

{"title":"GRACE: A Compressed Communication Framework for Distributed Machine Learning","authors":"Hang Xu, Chen-Yu Ho, A. Abdelmoniem, Aritra Dutta, E. Bergou, Konstantinos Karatsenidis, M. Canini, Panos Kalnis","doi":"10.1109/ICDCS51616.2021.00060","DOIUrl":null,"url":null,"abstract":"Powerful computer clusters are used nowadays to train complex deep neural networks (DNN) on large datasets. Distributed training increasingly becomes communication bound. For this reason, many lossy compression techniques have been proposed to reduce the volume of transferred data. Unfortunately, it is difficult to argue about the behavior of compression methods, because existing work relies on inconsistent evaluation testbeds and largely ignores the performance impact of practical system configurations. In this paper, we present a comprehensive survey of the most influential compressed communication methods for DNN training, together with an intuitive classification (i.e., quantization, sparsification, hybrid and low-rank). Next, we propose GRACE, a unified framework and API that allows for consistent and easy implementation of compressed communication on popular machine learning toolkits. We instantiate GRACE on TensorFlow and PyTorch, and implement 16 such methods. Finally, we present a thorough quantitative evaluation with a variety of DNNs (convolutional and recurrent), datasets and system configurations. We show that the DNN architecture affects the relative performance among methods. Interestingly, depending on the underlying communication library and computational cost of compression / decompression, we demonstrate that some methods may be impractical. GRACE and the entire benchmarking suite are available as open-source.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"60","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS51616.2021.00060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 60

Abstract

Powerful computer clusters are used nowadays to train complex deep neural networks (DNN) on large datasets. Distributed training increasingly becomes communication bound. For this reason, many lossy compression techniques have been proposed to reduce the volume of transferred data. Unfortunately, it is difficult to argue about the behavior of compression methods, because existing work relies on inconsistent evaluation testbeds and largely ignores the performance impact of practical system configurations. In this paper, we present a comprehensive survey of the most influential compressed communication methods for DNN training, together with an intuitive classification (i.e., quantization, sparsification, hybrid and low-rank). Next, we propose GRACE, a unified framework and API that allows for consistent and easy implementation of compressed communication on popular machine learning toolkits. We instantiate GRACE on TensorFlow and PyTorch, and implement 16 such methods. Finally, we present a thorough quantitative evaluation with a variety of DNNs (convolutional and recurrent), datasets and system configurations. We show that the DNN architecture affects the relative performance among methods. Interestingly, depending on the underlying communication library and computational cost of compression / decompression, we demonstrate that some methods may be impractical. GRACE and the entire benchmarking suite are available as open-source.

查看原文本刊更多论文

GRACE:分布式机器学习的压缩通信框架

如今，强大的计算机集群被用于在大型数据集上训练复杂的深度神经网络(DNN)。分布式训练越来越成为通信约束。由于这个原因，已经提出了许多有损压缩技术来减少传输的数据量。不幸的是，很难争论压缩方法的行为，因为现有的工作依赖于不一致的评估测试平台，并且在很大程度上忽略了实际系统配置对性能的影响。在本文中，我们对DNN训练中最具影响力的压缩通信方法进行了全面的调查，并进行了直观的分类(即量化、稀疏化、混合和低秩)。接下来，我们提出了GRACE，这是一个统一的框架和API，允许在流行的机器学习工具包上一致和轻松地实现压缩通信。我们在TensorFlow和PyTorch上实例化了GRACE，并实现了16个这样的方法。最后，我们对各种深度神经网络(卷积和循环)、数据集和系统配置进行了彻底的定量评估。我们证明了深度神经网络的结构会影响不同方法之间的相对性能。有趣的是，根据底层通信库和压缩/解压缩的计算成本，我们证明了一些方法可能是不切实际的。GRACE和整个基准测试套件都是开源的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)

自引率

0.00%

发文量