R-FAST:通用拓扑上的鲁棒全同步随机梯度跟踪

IF 3 3区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Zehan Zhu;Ye Tian;Yan Huang;Jinming Xu;Shibo He
{"title":"R-FAST:通用拓扑上的鲁棒全同步随机梯度跟踪","authors":"Zehan Zhu;Ye Tian;Yan Huang;Jinming Xu;Shibo He","doi":"10.1109/TSIPN.2024.3444484","DOIUrl":null,"url":null,"abstract":"We propose a Robust Fully-Asynchronous Stochastic Gradient Tracking method (R-FAST) for distributed machine learning problems over a network of nodes, where each node performs local computation and communication at its own pace without any form of synchronization. Different from existing asynchronous distributed algorithms, R-FAST can eliminate the impact of data heterogeneity across nodes on convergence performance and allow for packet losses by employing a robust gradient tracking strategy that relies on properly designed auxiliary variables for tracking and buffering the overall gradient vector. Moreover, the proposed method utilizes two spanning-tree graphs for communication so long as both share at least one common root, enabling flexible designs in communication topologies. We show that R-FAST converges in expectation to a neighborhood of the optimum with a geometric rate for smooth and strongly convex objectives; and to a stationary point with a sublinear rate for general non-convex problems. Extensive experiments demonstrate that R-FAST runs 1.5-2 times faster than synchronous benchmark algorithms, such as Ring-AllReduce and D-PSGD, while still achieving comparable accuracy, and outperforms the existing well-known asynchronous algorithms, such as AD-PSGD and OSGP, especially in the presence of stragglers.","PeriodicalId":56268,"journal":{"name":"IEEE Transactions on Signal and Information Processing over Networks","volume":"10 ","pages":"665-678"},"PeriodicalIF":3.0000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"R-FAST: Robust Fully-Asynchronous Stochastic Gradient Tracking Over General Topology\",\"authors\":\"Zehan Zhu;Ye Tian;Yan Huang;Jinming Xu;Shibo He\",\"doi\":\"10.1109/TSIPN.2024.3444484\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a Robust Fully-Asynchronous Stochastic Gradient Tracking method (R-FAST) for distributed machine learning problems over a network of nodes, where each node performs local computation and communication at its own pace without any form of synchronization. Different from existing asynchronous distributed algorithms, R-FAST can eliminate the impact of data heterogeneity across nodes on convergence performance and allow for packet losses by employing a robust gradient tracking strategy that relies on properly designed auxiliary variables for tracking and buffering the overall gradient vector. Moreover, the proposed method utilizes two spanning-tree graphs for communication so long as both share at least one common root, enabling flexible designs in communication topologies. We show that R-FAST converges in expectation to a neighborhood of the optimum with a geometric rate for smooth and strongly convex objectives; and to a stationary point with a sublinear rate for general non-convex problems. Extensive experiments demonstrate that R-FAST runs 1.5-2 times faster than synchronous benchmark algorithms, such as Ring-AllReduce and D-PSGD, while still achieving comparable accuracy, and outperforms the existing well-known asynchronous algorithms, such as AD-PSGD and OSGP, especially in the presence of stragglers.\",\"PeriodicalId\":56268,\"journal\":{\"name\":\"IEEE Transactions on Signal and Information Processing over Networks\",\"volume\":\"10 \",\"pages\":\"665-678\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Signal and Information Processing over Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10660468/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal and Information Processing over Networks","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10660468/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

我们针对节点网络上的分布式机器学习问题提出了一种鲁棒全异步随机梯度跟踪方法(R-FAST),在这种方法中,每个节点都以自己的节奏执行本地计算和通信,而不需要任何形式的同步。与现有的异步分布式算法不同,R-FAST 可以消除节点间数据异质性对收敛性能的影响,并通过采用鲁棒梯度跟踪策略,依靠适当设计的辅助变量来跟踪和缓冲整体梯度向量,从而允许数据包丢失。此外,只要两个生成树图至少有一个共同的根,所提出的方法就能利用两个生成树图进行通信,从而实现灵活的通信拓扑设计。我们的研究表明,对于平滑和强凸目标,R-FAST 在期望值上以几何速度收敛到最优点附近;对于一般非凸问题,R-FAST 以亚线性速度收敛到静止点。大量实验证明,R-FAST 的运行速度比 Ring-AllReduce 和 D-PSGD 等同步基准算法快 1.5-2 倍,同时还能达到相当的精度,并且优于 AD-PSGD 和 OSGP 等现有的著名异步算法,尤其是在有散兵游勇的情况下。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
R-FAST: Robust Fully-Asynchronous Stochastic Gradient Tracking Over General Topology
We propose a Robust Fully-Asynchronous Stochastic Gradient Tracking method (R-FAST) for distributed machine learning problems over a network of nodes, where each node performs local computation and communication at its own pace without any form of synchronization. Different from existing asynchronous distributed algorithms, R-FAST can eliminate the impact of data heterogeneity across nodes on convergence performance and allow for packet losses by employing a robust gradient tracking strategy that relies on properly designed auxiliary variables for tracking and buffering the overall gradient vector. Moreover, the proposed method utilizes two spanning-tree graphs for communication so long as both share at least one common root, enabling flexible designs in communication topologies. We show that R-FAST converges in expectation to a neighborhood of the optimum with a geometric rate for smooth and strongly convex objectives; and to a stationary point with a sublinear rate for general non-convex problems. Extensive experiments demonstrate that R-FAST runs 1.5-2 times faster than synchronous benchmark algorithms, such as Ring-AllReduce and D-PSGD, while still achieving comparable accuracy, and outperforms the existing well-known asynchronous algorithms, such as AD-PSGD and OSGP, especially in the presence of stragglers.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Signal and Information Processing over Networks
IEEE Transactions on Signal and Information Processing over Networks Computer Science-Computer Networks and Communications
CiteScore
5.80
自引率
12.50%
发文量
56
期刊介绍: The IEEE Transactions on Signal and Information Processing over Networks publishes high-quality papers that extend the classical notions of processing of signals defined over vector spaces (e.g. time and space) to processing of signals and information (data) defined over networks, potentially dynamically varying. In signal processing over networks, the topology of the network may define structural relationships in the data, or may constrain processing of the data. Topics include distributed algorithms for filtering, detection, estimation, adaptation and learning, model selection, data fusion, and diffusion or evolution of information over such networks, and applications of distributed signal processing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信