SketchML:使用数据草图加速分布式机器学习

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI:10.1145/3183713.3196894

Jiawei Jiang, Fangcheng Fu, Tong Yang, B. Cui

{"title":"SketchML:使用数据草图加速分布式机器学习","authors":"Jiawei Jiang, Fangcheng Fu, Tong Yang, B. Cui","doi":"10.1145/3183713.3196894","DOIUrl":null,"url":null,"abstract":"To address the challenge of explosive big data, distributed machine learning (ML) has drawn the interests of many researchers. Since many distributed ML algorithms trained by stochastic gradient descent (SGD) involve communicating gradients through the network, it is important to compress the transferred gradient. A category of low-precision algorithms can significantly reduce the size of gradients, at the expense of some precision loss. However, existing low-precision methods are not suitable for many cases where the gradients are sparse and nonuniformly distributed. In this paper, we study is there a compression method that can efficiently handle a sparse and nonuniform gradient consisting of key-value pairs? Our first contribution is a sketch based method that compresses the gradient values. Sketch is a class of algorithms using a probabilistic data structure to approximate the distribution of input data. We design a quantile-bucket quantification method that uses a quantile sketch to sort gradient values into buckets and encodes them with the bucket indexes. To further compress the bucket indexes, our second contribution is a sketch algorithm, namely MinMaxSketch. MinMaxSketch builds a set of hash tables and solves hash collisions with a MinMax strategy. The third contribution of this paper is a delta-binary encoding method that calculates the increment of the gradient keys and stores them with fewer bytes. We also theoretically discuss the correctness and the error bound of three proposed methods. To the best of our knowledge, this is the first effort combining data sketch with ML. We implement a prototype system in a real cluster of our industrial partner Tencent Inc., and show that our method is up to 10X faster than existing methods.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"90","resultStr":"{\"title\":\"SketchML: Accelerating Distributed Machine Learning with Data Sketches\",\"authors\":\"Jiawei Jiang, Fangcheng Fu, Tong Yang, B. Cui\",\"doi\":\"10.1145/3183713.3196894\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To address the challenge of explosive big data, distributed machine learning (ML) has drawn the interests of many researchers. Since many distributed ML algorithms trained by stochastic gradient descent (SGD) involve communicating gradients through the network, it is important to compress the transferred gradient. A category of low-precision algorithms can significantly reduce the size of gradients, at the expense of some precision loss. However, existing low-precision methods are not suitable for many cases where the gradients are sparse and nonuniformly distributed. In this paper, we study is there a compression method that can efficiently handle a sparse and nonuniform gradient consisting of key-value pairs? Our first contribution is a sketch based method that compresses the gradient values. Sketch is a class of algorithms using a probabilistic data structure to approximate the distribution of input data. We design a quantile-bucket quantification method that uses a quantile sketch to sort gradient values into buckets and encodes them with the bucket indexes. To further compress the bucket indexes, our second contribution is a sketch algorithm, namely MinMaxSketch. MinMaxSketch builds a set of hash tables and solves hash collisions with a MinMax strategy. The third contribution of this paper is a delta-binary encoding method that calculates the increment of the gradient keys and stores them with fewer bytes. We also theoretically discuss the correctness and the error bound of three proposed methods. To the best of our knowledge, this is the first effort combining data sketch with ML. We implement a prototype system in a real cluster of our industrial partner Tencent Inc., and show that our method is up to 10X faster than existing methods.\",\"PeriodicalId\":20430,\"journal\":{\"name\":\"Proceedings of the 2018 International Conference on Management of Data\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"90\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 International Conference on Management of Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3183713.3196894\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3183713.3196894","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 90

摘要

为了应对爆炸性大数据带来的挑战，分布式机器学习(ML)引起了许多研究人员的兴趣。由于许多由随机梯度下降(SGD)训练的分布式机器学习算法涉及通过网络通信梯度，因此压缩传输的梯度非常重要。一类低精度算法可以显著减小梯度的大小，但代价是一定的精度损失。然而，现有的低精度方法不适用于梯度稀疏和不均匀分布的许多情况。在本文中，我们研究了是否有一种压缩方法可以有效地处理由键值对组成的稀疏非均匀梯度。我们的第一个贡献是基于草图的方法来压缩梯度值。Sketch是一类使用概率数据结构来近似输入数据分布的算法。我们设计了一种分位数-桶量化方法，使用分位数草图将梯度值排序到桶中，并用桶索引对其进行编码。为了进一步压缩桶索引，我们的第二个贡献是一个草图算法，即MinMaxSketch。MinMaxSketch构建一组哈希表，并使用MinMax策略解决哈希冲突。本文的第三个贡献是一种增量二进制编码方法，该方法可以计算梯度键的增量并使用更少的字节存储它们。从理论上讨论了三种方法的正确性和误差范围。据我们所知，这是第一次将数据草图与机器学习相结合。我们在我们的工业合作伙伴腾讯公司的一个真实集群中实现了一个原型系统，并表明我们的方法比现有方法快10倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SketchML: Accelerating Distributed Machine Learning with Data Sketches

To address the challenge of explosive big data, distributed machine learning (ML) has drawn the interests of many researchers. Since many distributed ML algorithms trained by stochastic gradient descent (SGD) involve communicating gradients through the network, it is important to compress the transferred gradient. A category of low-precision algorithms can significantly reduce the size of gradients, at the expense of some precision loss. However, existing low-precision methods are not suitable for many cases where the gradients are sparse and nonuniformly distributed. In this paper, we study is there a compression method that can efficiently handle a sparse and nonuniform gradient consisting of key-value pairs? Our first contribution is a sketch based method that compresses the gradient values. Sketch is a class of algorithms using a probabilistic data structure to approximate the distribution of input data. We design a quantile-bucket quantification method that uses a quantile sketch to sort gradient values into buckets and encodes them with the bucket indexes. To further compress the bucket indexes, our second contribution is a sketch algorithm, namely MinMaxSketch. MinMaxSketch builds a set of hash tables and solves hash collisions with a MinMax strategy. The third contribution of this paper is a delta-binary encoding method that calculates the increment of the gradient keys and stores them with fewer bytes. We also theoretically discuss the correctness and the error bound of three proposed methods. To the best of our knowledge, this is the first effort combining data sketch with ML. We implement a prototype system in a real cluster of our industrial partner Tencent Inc., and show that our method is up to 10X faster than existing methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2018 International Conference on Management of Data

自引率

0.00%

发文量