利用选择性多播实现对等分布式学习的高效参数同步

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Services Computing Pub Date : 2024-11-25 DOI:10.1109/TSC.2024.3506480

Shouxi Luo;Pingzhi Fan;Ke Li;Huanlai Xing;Long Luo;Hongfang Yu

{"title":"利用选择性多播实现对等分布式学习的高效参数同步","authors":"Shouxi Luo;Pingzhi Fan;Ke Li;Huanlai Xing;Long Luo;Hongfang Yu","doi":"10.1109/TSC.2024.3506480","DOIUrl":null,"url":null,"abstract":"Recent advances in distributed machine learning show theoretically and empirically that, for many models, provided that workers will eventually participate in the synchronizations, <inline-formula><tex-math>$i)$</tex-math></inline-formula> the training still converges, even if only <inline-formula><tex-math>$p$</tex-math></inline-formula> workers take part in each round of synchronization, and <inline-formula><tex-math>$ii)$</tex-math></inline-formula> a larger <inline-formula><tex-math>$p$</tex-math></inline-formula> generally leads to a faster rate of convergence. These findings shed light on eliminating the bottleneck effects of parameter synchronization in large-scale data-parallel distributed training and have motivated several optimization designs. In this paper, we focus on optimizing the parameter synchronization for peer-to-peer distributed learning, where workers broadcast or multicast their updated parameters to others for synchronization, and propose SelMcast, a suite of expressive and efficient multicast receiver selection algorithms, to achieve the goal. Compared with the state-of-the-art (SOTA) design, which randomly selects exactly <inline-formula><tex-math>$p$</tex-math></inline-formula> receivers for each worker’s multicast in a bandwidth-agnostic way, SelMcast chooses receivers based on the global view of their available bandwidth and loads, yielding two advantages, i.e., accelerated parameter synchronization for higher utilization of computing resources and enlarged average <inline-formula><tex-math>$p$</tex-math></inline-formula> values for faster convergence. Comprehensive evaluations show that SelMcast is efficient for both peer-to-peer Bulk Synchronous Parallel (BSP) and Stale Synchronous Parallel (SSP) distributed training, outperforming the SOTA solution significantly.","PeriodicalId":13255,"journal":{"name":"IEEE Transactions on Services Computing","volume":"18 1","pages":"156-168"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Parameter Synchronization for Peer-to-Peer Distributed Learning With Selective Multicast\",\"authors\":\"Shouxi Luo;Pingzhi Fan;Ke Li;Huanlai Xing;Long Luo;Hongfang Yu\",\"doi\":\"10.1109/TSC.2024.3506480\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advances in distributed machine learning show theoretically and empirically that, for many models, provided that workers will eventually participate in the synchronizations, <inline-formula><tex-math>$i)$</tex-math></inline-formula> the training still converges, even if only <inline-formula><tex-math>$p$</tex-math></inline-formula> workers take part in each round of synchronization, and <inline-formula><tex-math>$ii)$</tex-math></inline-formula> a larger <inline-formula><tex-math>$p$</tex-math></inline-formula> generally leads to a faster rate of convergence. These findings shed light on eliminating the bottleneck effects of parameter synchronization in large-scale data-parallel distributed training and have motivated several optimization designs. In this paper, we focus on optimizing the parameter synchronization for peer-to-peer distributed learning, where workers broadcast or multicast their updated parameters to others for synchronization, and propose SelMcast, a suite of expressive and efficient multicast receiver selection algorithms, to achieve the goal. Compared with the state-of-the-art (SOTA) design, which randomly selects exactly <inline-formula><tex-math>$p$</tex-math></inline-formula> receivers for each worker’s multicast in a bandwidth-agnostic way, SelMcast chooses receivers based on the global view of their available bandwidth and loads, yielding two advantages, i.e., accelerated parameter synchronization for higher utilization of computing resources and enlarged average <inline-formula><tex-math>$p$</tex-math></inline-formula> values for faster convergence. Comprehensive evaluations show that SelMcast is efficient for both peer-to-peer Bulk Synchronous Parallel (BSP) and Stale Synchronous Parallel (SSP) distributed training, outperforming the SOTA solution significantly.\",\"PeriodicalId\":13255,\"journal\":{\"name\":\"IEEE Transactions on Services Computing\",\"volume\":\"18 1\",\"pages\":\"156-168\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2024-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Services Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10767301/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Services Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10767301/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient Parameter Synchronization for Peer-to-Peer Distributed Learning With Selective Multicast

Recent advances in distributed machine learning show theoretically and empirically that, for many models, provided that workers will eventually participate in the synchronizations,

$i)$

the training still converges, even if only

$p$

workers take part in each round of synchronization, and

$ii)$

a larger

$p$

generally leads to a faster rate of convergence. These findings shed light on eliminating the bottleneck effects of parameter synchronization in large-scale data-parallel distributed training and have motivated several optimization designs. In this paper, we focus on optimizing the parameter synchronization for peer-to-peer distributed learning, where workers broadcast or multicast their updated parameters to others for synchronization, and propose SelMcast, a suite of expressive and efficient multicast receiver selection algorithms, to achieve the goal. Compared with the state-of-the-art (SOTA) design, which randomly selects exactly

$p$

receivers for each worker’s multicast in a bandwidth-agnostic way, SelMcast chooses receivers based on the global view of their available bandwidth and loads, yielding two advantages, i.e., accelerated parameter synchronization for higher utilization of computing resources and enlarged average

$p$

values for faster convergence. Comprehensive evaluations show that SelMcast is efficient for both peer-to-peer Bulk Synchronous Parallel (BSP) and Stale Synchronous Parallel (SSP) distributed training, outperforming the SOTA solution significantly.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Services Computing COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

11.50

自引率

6.20%

发文量

278

审稿时长

>12 weeks

期刊介绍： IEEE Transactions on Services Computing encompasses the computing and software aspects of the science and technology of services innovation research and development. It places emphasis on algorithmic, mathematical, statistical, and computational methods central to services computing. Topics covered include Service Oriented Architecture, Web Services, Business Process Integration, Solution Performance Management, and Services Operations and Management. The transactions address mathematical foundations, security, privacy, agreement, contract, discovery, negotiation, collaboration, and quality of service for web services. It also covers areas like composite web service creation, business and scientific applications, standards, utility models, business process modeling, integration, collaboration, and more in the realm of Services Computing.