Jianyu Wang, Anit Kumar Sahu, Zhouyi Yang, Gauri Joshi, S. Kar
{"title":"MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling","authors":"Jianyu Wang, Anit Kumar Sahu, Zhouyi Yang, Gauri Joshi, S. Kar","doi":"10.1109/ICC47138.2019.9123209","DOIUrl":null,"url":null,"abstract":"Decentralized stochastic gradient descent (SGD) is a promising approach to learn a machine learning model over a network of workers connected in an arbitrary topology. Although a densely-connected network topology can ensure faster convergence in terms of iterations, it incurs more communication time/delay per iteration, resulting in longer training time. In this paper, we propose a novel algorithm MATCHA to achieve a win-win in this error-runtime trade-off. MATCHA uses matching decomposition sampling of the base topology to parallelize inter-worker information exchange so as to significantly reduce communication delay. At the same time, the algorithm communicates more frequently over critical links such that it can maintain the same convergence rate as vanilla decentralized SGD. Experiments on a suite of datasets and deep neural networks validate the theoretical analysis and demonstrate the effectiveness of the proposed scheme as far as reducing communication delays is concerned.","PeriodicalId":231050,"journal":{"name":"2019 Sixth Indian Control Conference (ICC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"122","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Sixth Indian Control Conference (ICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICC47138.2019.9123209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 122
Abstract
Decentralized stochastic gradient descent (SGD) is a promising approach to learn a machine learning model over a network of workers connected in an arbitrary topology. Although a densely-connected network topology can ensure faster convergence in terms of iterations, it incurs more communication time/delay per iteration, resulting in longer training time. In this paper, we propose a novel algorithm MATCHA to achieve a win-win in this error-runtime trade-off. MATCHA uses matching decomposition sampling of the base topology to parallelize inter-worker information exchange so as to significantly reduce communication delay. At the same time, the algorithm communicates more frequently over critical links such that it can maintain the same convergence rate as vanilla decentralized SGD. Experiments on a suite of datasets and deep neural networks validate the theoretical analysis and demonstrate the effectiveness of the proposed scheme as far as reducing communication delays is concerned.