{"title":"利用选择性多播实现对等分布式学习的高效参数同步","authors":"Shouxi Luo;Pingzhi Fan;Ke Li;Huanlai Xing;Long Luo;Hongfang Yu","doi":"10.1109/TSC.2024.3506480","DOIUrl":null,"url":null,"abstract":"Recent advances in distributed machine learning show theoretically and empirically that, for many models, provided that workers will eventually participate in the synchronizations, <inline-formula><tex-math>$i)$</tex-math></inline-formula> the training still converges, even if only <inline-formula><tex-math>$p$</tex-math></inline-formula> workers take part in each round of synchronization, and <inline-formula><tex-math>$ii)$</tex-math></inline-formula> a larger <inline-formula><tex-math>$p$</tex-math></inline-formula> generally leads to a faster rate of convergence. These findings shed light on eliminating the bottleneck effects of parameter synchronization in large-scale data-parallel distributed training and have motivated several optimization designs. In this paper, we focus on optimizing the parameter synchronization for <i>peer-to-peer</i> distributed learning, where workers broadcast or multicast their updated parameters to others for synchronization, and propose <small>SelMcast</small>, a suite of expressive and efficient multicast receiver selection algorithms, to achieve the goal. Compared with the state-of-the-art (SOTA) design, which randomly selects exactly <inline-formula><tex-math>$p$</tex-math></inline-formula> receivers for each worker’s multicast in a bandwidth-agnostic way, <small>SelMcast</small> chooses receivers based on the global view of their available bandwidth and loads, yielding two advantages, i.e., accelerated parameter synchronization for higher utilization of computing resources and enlarged average <inline-formula><tex-math>$p$</tex-math></inline-formula> values for faster convergence. Comprehensive evaluations show that <small>SelMcast</small> is efficient for both peer-to-peer Bulk Synchronous Parallel (BSP) and Stale Synchronous Parallel (SSP) distributed training, outperforming the SOTA solution significantly.","PeriodicalId":13255,"journal":{"name":"IEEE Transactions on Services Computing","volume":"18 1","pages":"156-168"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Parameter Synchronization for Peer-to-Peer Distributed Learning With Selective Multicast\",\"authors\":\"Shouxi Luo;Pingzhi Fan;Ke Li;Huanlai Xing;Long Luo;Hongfang Yu\",\"doi\":\"10.1109/TSC.2024.3506480\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advances in distributed machine learning show theoretically and empirically that, for many models, provided that workers will eventually participate in the synchronizations, <inline-formula><tex-math>$i)$</tex-math></inline-formula> the training still converges, even if only <inline-formula><tex-math>$p$</tex-math></inline-formula> workers take part in each round of synchronization, and <inline-formula><tex-math>$ii)$</tex-math></inline-formula> a larger <inline-formula><tex-math>$p$</tex-math></inline-formula> generally leads to a faster rate of convergence. These findings shed light on eliminating the bottleneck effects of parameter synchronization in large-scale data-parallel distributed training and have motivated several optimization designs. In this paper, we focus on optimizing the parameter synchronization for <i>peer-to-peer</i> distributed learning, where workers broadcast or multicast their updated parameters to others for synchronization, and propose <small>SelMcast</small>, a suite of expressive and efficient multicast receiver selection algorithms, to achieve the goal. Compared with the state-of-the-art (SOTA) design, which randomly selects exactly <inline-formula><tex-math>$p$</tex-math></inline-formula> receivers for each worker’s multicast in a bandwidth-agnostic way, <small>SelMcast</small> chooses receivers based on the global view of their available bandwidth and loads, yielding two advantages, i.e., accelerated parameter synchronization for higher utilization of computing resources and enlarged average <inline-formula><tex-math>$p$</tex-math></inline-formula> values for faster convergence. Comprehensive evaluations show that <small>SelMcast</small> is efficient for both peer-to-peer Bulk Synchronous Parallel (BSP) and Stale Synchronous Parallel (SSP) distributed training, outperforming the SOTA solution significantly.\",\"PeriodicalId\":13255,\"journal\":{\"name\":\"IEEE Transactions on Services Computing\",\"volume\":\"18 1\",\"pages\":\"156-168\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2024-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Services Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10767301/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Services Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10767301/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Efficient Parameter Synchronization for Peer-to-Peer Distributed Learning With Selective Multicast
Recent advances in distributed machine learning show theoretically and empirically that, for many models, provided that workers will eventually participate in the synchronizations, $i)$ the training still converges, even if only $p$ workers take part in each round of synchronization, and $ii)$ a larger $p$ generally leads to a faster rate of convergence. These findings shed light on eliminating the bottleneck effects of parameter synchronization in large-scale data-parallel distributed training and have motivated several optimization designs. In this paper, we focus on optimizing the parameter synchronization for peer-to-peer distributed learning, where workers broadcast or multicast their updated parameters to others for synchronization, and propose SelMcast, a suite of expressive and efficient multicast receiver selection algorithms, to achieve the goal. Compared with the state-of-the-art (SOTA) design, which randomly selects exactly $p$ receivers for each worker’s multicast in a bandwidth-agnostic way, SelMcast chooses receivers based on the global view of their available bandwidth and loads, yielding two advantages, i.e., accelerated parameter synchronization for higher utilization of computing resources and enlarged average $p$ values for faster convergence. Comprehensive evaluations show that SelMcast is efficient for both peer-to-peer Bulk Synchronous Parallel (BSP) and Stale Synchronous Parallel (SSP) distributed training, outperforming the SOTA solution significantly.
期刊介绍:
IEEE Transactions on Services Computing encompasses the computing and software aspects of the science and technology of services innovation research and development. It places emphasis on algorithmic, mathematical, statistical, and computational methods central to services computing. Topics covered include Service Oriented Architecture, Web Services, Business Process Integration, Solution Performance Management, and Services Operations and Management. The transactions address mathematical foundations, security, privacy, agreement, contract, discovery, negotiation, collaboration, and quality of service for web services. It also covers areas like composite web service creation, business and scientific applications, standards, utility models, business process modeling, integration, collaboration, and more in the realm of Services Computing.