Nuwan S. Ferdinand, Benjamin Gharachorloo, S. Draper
{"title":"Anytime Exploitation of Stragglers in Synchronous Stochastic Gradient Descent","authors":"Nuwan S. Ferdinand, Benjamin Gharachorloo, S. Draper","doi":"10.1109/ICMLA.2017.0-166","DOIUrl":null,"url":null,"abstract":"In this paper we propose an approach to parallelizing synchronous stochastic gradient descent (SGD) that we term “Anytime-Gradients”. The Anytime-Gradients is designed to exploit the work completed by slow compute nodes or “stragglers”. In many approaches work completed by these nodes, while only partial, is discarded completely. To maintain synchronization in our approach, each computational epoch is of fixed duration, and at the end of each epoch, workers send updated parameter vectors to a master mode for combination. The master weights each update by the amount of work done. The Anytime-Gradients scheme is robust to both persistent and non-persistent stragglers and requires no prior knowledge about processor abilities. We show that the scheme effectively exploits stragglers and outperforms existing methods.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"42 1","pages":"141-146"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2017.0-166","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
Abstract
In this paper we propose an approach to parallelizing synchronous stochastic gradient descent (SGD) that we term “Anytime-Gradients”. The Anytime-Gradients is designed to exploit the work completed by slow compute nodes or “stragglers”. In many approaches work completed by these nodes, while only partial, is discarded completely. To maintain synchronization in our approach, each computational epoch is of fixed duration, and at the end of each epoch, workers send updated parameter vectors to a master mode for combination. The master weights each update by the amount of work done. The Anytime-Gradients scheme is robust to both persistent and non-persistent stragglers and requires no prior knowledge about processor abilities. We show that the scheme effectively exploits stragglers and outperforms existing methods.