{"title":"深度学习中带聚合的梯度稀疏化通信使用优化","authors":"Sheng Wang, Pangfeng Liu, Jan-Jan Wu","doi":"10.1145/3301326.3301347","DOIUrl":null,"url":null,"abstract":"Communication usage is a bottleneck of scaling workers for distributed deep learning. One solution is to compress the exchanged gradients into sparse format with gradient sparsification. We found that the send cost of server, which is the aggregated size of sparse gradient, can be reduced by the gradient selection from workers. Following an observation that only a few gradients are significantly large and in a short period of time, we proposed several gradient selection algorithms based on different metrics. Experiment showed that our proposed method can reduce the aggregated size for server, and the reduction in time per iteration can make the convergence rate faster than traditional sparsification.","PeriodicalId":294040,"journal":{"name":"Proceedings of the 2018 VII International Conference on Network, Communication and Computing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Communication Usage Optimization of Gradient Sparsification with Aggregation in Deep Learning\",\"authors\":\"Sheng Wang, Pangfeng Liu, Jan-Jan Wu\",\"doi\":\"10.1145/3301326.3301347\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Communication usage is a bottleneck of scaling workers for distributed deep learning. One solution is to compress the exchanged gradients into sparse format with gradient sparsification. We found that the send cost of server, which is the aggregated size of sparse gradient, can be reduced by the gradient selection from workers. Following an observation that only a few gradients are significantly large and in a short period of time, we proposed several gradient selection algorithms based on different metrics. Experiment showed that our proposed method can reduce the aggregated size for server, and the reduction in time per iteration can make the convergence rate faster than traditional sparsification.\",\"PeriodicalId\":294040,\"journal\":{\"name\":\"Proceedings of the 2018 VII International Conference on Network, Communication and Computing\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 VII International Conference on Network, Communication and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3301326.3301347\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 VII International Conference on Network, Communication and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3301326.3301347","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Communication Usage Optimization of Gradient Sparsification with Aggregation in Deep Learning
Communication usage is a bottleneck of scaling workers for distributed deep learning. One solution is to compress the exchanged gradients into sparse format with gradient sparsification. We found that the send cost of server, which is the aggregated size of sparse gradient, can be reduced by the gradient selection from workers. Following an observation that only a few gradients are significantly large and in a short period of time, we proposed several gradient selection algorithms based on different metrics. Experiment showed that our proposed method can reduce the aggregated size for server, and the reduction in time per iteration can make the convergence rate faster than traditional sparsification.