Nuwan S. Ferdinand, Benjamin Gharachorloo, S. Draper
{"title":"同步随机梯度下降中离散机的随时开发","authors":"Nuwan S. Ferdinand, Benjamin Gharachorloo, S. Draper","doi":"10.1109/ICMLA.2017.0-166","DOIUrl":null,"url":null,"abstract":"In this paper we propose an approach to parallelizing synchronous stochastic gradient descent (SGD) that we term “Anytime-Gradients”. The Anytime-Gradients is designed to exploit the work completed by slow compute nodes or “stragglers”. In many approaches work completed by these nodes, while only partial, is discarded completely. To maintain synchronization in our approach, each computational epoch is of fixed duration, and at the end of each epoch, workers send updated parameter vectors to a master mode for combination. The master weights each update by the amount of work done. The Anytime-Gradients scheme is robust to both persistent and non-persistent stragglers and requires no prior knowledge about processor abilities. We show that the scheme effectively exploits stragglers and outperforms existing methods.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"42 1","pages":"141-146"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"Anytime Exploitation of Stragglers in Synchronous Stochastic Gradient Descent\",\"authors\":\"Nuwan S. Ferdinand, Benjamin Gharachorloo, S. Draper\",\"doi\":\"10.1109/ICMLA.2017.0-166\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we propose an approach to parallelizing synchronous stochastic gradient descent (SGD) that we term “Anytime-Gradients”. The Anytime-Gradients is designed to exploit the work completed by slow compute nodes or “stragglers”. In many approaches work completed by these nodes, while only partial, is discarded completely. To maintain synchronization in our approach, each computational epoch is of fixed duration, and at the end of each epoch, workers send updated parameter vectors to a master mode for combination. The master weights each update by the amount of work done. The Anytime-Gradients scheme is robust to both persistent and non-persistent stragglers and requires no prior knowledge about processor abilities. We show that the scheme effectively exploits stragglers and outperforms existing methods.\",\"PeriodicalId\":6636,\"journal\":{\"name\":\"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"42 1\",\"pages\":\"141-146\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2017.0-166\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2017.0-166","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Anytime Exploitation of Stragglers in Synchronous Stochastic Gradient Descent
In this paper we propose an approach to parallelizing synchronous stochastic gradient descent (SGD) that we term “Anytime-Gradients”. The Anytime-Gradients is designed to exploit the work completed by slow compute nodes or “stragglers”. In many approaches work completed by these nodes, while only partial, is discarded completely. To maintain synchronization in our approach, each computational epoch is of fixed duration, and at the end of each epoch, workers send updated parameter vectors to a master mode for combination. The master weights each update by the amount of work done. The Anytime-Gradients scheme is robust to both persistent and non-persistent stragglers and requires no prior knowledge about processor abilities. We show that the scheme effectively exploits stragglers and outperforms existing methods.