SDPipe:一个半分散的异构感知管道并行训练框架

Proc. VLDB Endow. Pub Date : 2023-05-01 DOI:10.14778/3598581.3598604

Xupeng Miao, Yining Shi, Zhi Yang, Bin Cui, Zhihao Jia

{"title":"SDPipe:一个半分散的异构感知管道并行训练框架","authors":"Xupeng Miao, Yining Shi, Zhi Yang, Bin Cui, Zhihao Jia","doi":"10.14778/3598581.3598604","DOIUrl":null,"url":null,"abstract":"\n The increasing size of both deep learning models and training data necessitates the ability to scale out model training through pipeline-parallel training, which combines pipelined model parallelism and data parallelism. However, most of them assume an ideal homogeneous dedicated cluster. As for real cloud clusters, these approaches suffer from the intensive model synchronization overheads due to the dynamic environment heterogeneity. Such a huge challenge leaves the design in a dilemma: either the performance bottleneck of the central parameter server (PS) or severe performance degradation caused by stragglers for decentralized synchronization (like All-Reduce). This approach presents SDPipe, a new\n semi-decentralized\n framework to get the best of both worlds, achieving both high heterogeneity tolerance and convergence efficiency in pipeline-parallel training. To provide high performance, we decentralize the communication model synchronization, which accounts for the largest proportion of synchronization overhead. In contrast, we centralize the process of group scheduling, which is lightweight but needs a global view for better performance and convergence speed against heterogeneity. We show via a prototype implementation the significant advantage of SDPipe on performance and scalability, facing different environments.\n","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"64 1","pages":"2354-2363"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"SDPipe: A Semi-Decentralized Framework for Heterogeneity-aware Pipeline-parallel Training\",\"authors\":\"Xupeng Miao, Yining Shi, Zhi Yang, Bin Cui, Zhihao Jia\",\"doi\":\"10.14778/3598581.3598604\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n The increasing size of both deep learning models and training data necessitates the ability to scale out model training through pipeline-parallel training, which combines pipelined model parallelism and data parallelism. However, most of them assume an ideal homogeneous dedicated cluster. As for real cloud clusters, these approaches suffer from the intensive model synchronization overheads due to the dynamic environment heterogeneity. Such a huge challenge leaves the design in a dilemma: either the performance bottleneck of the central parameter server (PS) or severe performance degradation caused by stragglers for decentralized synchronization (like All-Reduce). This approach presents SDPipe, a new\\n semi-decentralized\\n framework to get the best of both worlds, achieving both high heterogeneity tolerance and convergence efficiency in pipeline-parallel training. To provide high performance, we decentralize the communication model synchronization, which accounts for the largest proportion of synchronization overhead. In contrast, we centralize the process of group scheduling, which is lightweight but needs a global view for better performance and convergence speed against heterogeneity. We show via a prototype implementation the significant advantage of SDPipe on performance and scalability, facing different environments.\\n\",\"PeriodicalId\":20467,\"journal\":{\"name\":\"Proc. VLDB Endow.\",\"volume\":\"64 1\",\"pages\":\"2354-2363\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proc. VLDB Endow.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14778/3598581.3598604\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. VLDB Endow.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14778/3598581.3598604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

随着深度学习模型和训练数据规模的不断增加，需要通过管道并行训练来扩展模型训练的能力，这种训练结合了管道模型并行性和数据并行性。然而，它们中的大多数都假设有一个理想的同构专用集群。对于真实的云集群，由于动态环境的异构性，这些方法存在大量的模型同步开销。如此巨大的挑战使设计陷入两难境地:要么是中心参数服务器(PS)的性能瓶颈，要么是分散同步(如All-Reduce)的离散器导致的严重性能下降。该方法提出了一种新的半去中心化框架SDPipe，在管道并行训练中实现了较高的异构容忍度和收敛效率。为了提供高性能，我们分散了通信模型同步，这占同步开销的最大比例。相比之下，我们将组调度过程集中起来，这是轻量级的，但需要全局视图以获得更好的性能和收敛速度。我们通过一个原型实现展示了SDPipe在不同环境下在性能和可伸缩性方面的显著优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SDPipe: A Semi-Decentralized Framework for Heterogeneity-aware Pipeline-parallel Training

The increasing size of both deep learning models and training data necessitates the ability to scale out model training through pipeline-parallel training, which combines pipelined model parallelism and data parallelism. However, most of them assume an ideal homogeneous dedicated cluster. As for real cloud clusters, these approaches suffer from the intensive model synchronization overheads due to the dynamic environment heterogeneity. Such a huge challenge leaves the design in a dilemma: either the performance bottleneck of the central parameter server (PS) or severe performance degradation caused by stragglers for decentralized synchronization (like All-Reduce). This approach presents SDPipe, a new semi-decentralized framework to get the best of both worlds, achieving both high heterogeneity tolerance and convergence efficiency in pipeline-parallel training. To provide high performance, we decentralize the communication model synchronization, which accounts for the largest proportion of synchronization overhead. In contrast, we centralize the process of group scheduling, which is lightweight but needs a global view for better performance and convergence speed against heterogeneity. We show via a prototype implementation the significant advantage of SDPipe on performance and scalability, facing different environments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proc. VLDB Endow.

自引率

0.00%

发文量