PipeOptim: Ensuring Effective 1F1B Schedule With Optimizer-Dependent Weight Prediction

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-02-18 DOI:10.1109/TKDE.2025.3543225

Lei Guan;Dongsheng Li;Yongle Chen;Jiye Liang;Wenjian Wang;Xicheng Lu

{"title":"PipeOptim: Ensuring Effective 1F1B Schedule With Optimizer-Dependent Weight Prediction","authors":"Lei Guan;Dongsheng Li;Yongle Chen;Jiye Liang;Wenjian Wang;Xicheng Lu","doi":"10.1109/TKDE.2025.3543225","DOIUrl":null,"url":null,"abstract":"Asynchronous pipeline model parallelism with a “1F1B” (one forward, one backward) schedule generates little bubble overhead and always provides quite a high throughput. However, the “1F1B” schedule inevitably leads to weight inconsistency and weight staleness issues due to the cross-training of different mini-batches across GPUs. To simultaneously address these two problems, in this paper, we propose an optimizer-dependent weight prediction strategy (a.k.a PipeOptim) for asynchronous pipeline training. The key insight of our proposal is that we employ a weight prediction strategy in the forward pass to approximately ensure that each mini-batch uses consistent and staleness-free weights to compute the forward pass of the “1F1B” schedule. To be concrete, we first construct the weight prediction scheme based on the update rule of the used optimizer when training the deep neural network models. Then throughout the “1F1B” pipeline training, each mini-batch is mandated to execute weight prediction, subsequently employing the predicted weights to perform the forward pass. As a result, PipeOptim 1) inherits the advantage of the “1F1B” schedule and generates high throughput, and 2) can ensure effective parameter learning regardless of the type of the used optimizer. We conducted extensive experimental evaluations using nine different deep-learning models to verify the effectiveness of our proposal. The experiment results demonstrate that PipeOptim outperforms the other five popular pipeline approaches including GPipe, PipeDream, PipeDream-2BW, SpecTrain, and XPipe.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 5","pages":"2831-2845"},"PeriodicalIF":8.9000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10891748/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Asynchronous pipeline model parallelism with a “1F1B” (one forward, one backward) schedule generates little bubble overhead and always provides quite a high throughput. However, the “1F1B” schedule inevitably leads to weight inconsistency and weight staleness issues due to the cross-training of different mini-batches across GPUs. To simultaneously address these two problems, in this paper, we propose an optimizer-dependent weight prediction strategy (a.k.a PipeOptim) for asynchronous pipeline training. The key insight of our proposal is that we employ a weight prediction strategy in the forward pass to approximately ensure that each mini-batch uses consistent and staleness-free weights to compute the forward pass of the “1F1B” schedule. To be concrete, we first construct the weight prediction scheme based on the update rule of the used optimizer when training the deep neural network models. Then throughout the “1F1B” pipeline training, each mini-batch is mandated to execute weight prediction, subsequently employing the predicted weights to perform the forward pass. As a result, PipeOptim 1) inherits the advantage of the “1F1B” schedule and generates high throughput, and 2) can ensure effective parameter learning regardless of the type of the used optimizer. We conducted extensive experimental evaluations using nine different deep-learning models to verify the effectiveness of our proposal. The experiment results demonstrate that PipeOptim outperforms the other five popular pipeline approaches including GPipe, PipeDream, PipeDream-2BW, SpecTrain, and XPipe.

查看原文本刊更多论文

PipeOptim：通过优化器依赖的权重预测确保有效的1F1B计划

使用“1F1B”（一个向前，一个向后）调度的异步管道模型并行性产生很少的气泡开销，并且总是提供相当高的吞吐量。然而，“1F1B”计划不可避免地会导致权重不一致和权重过期问题，因为跨gpu的不同小批量交叉训练。为了同时解决这两个问题，在本文中，我们提出了一种基于优化器的权重预测策略（又名PipeOptim），用于异步管道训练。我们建议的关键观点是，我们在前向传递中采用了权重预测策略，以大致确保每个小批使用一致且无延迟的权重来计算“1F1B”计划的前向传递。具体而言，我们在训练深度神经网络模型时，首先基于所使用的优化器的更新规则构建权重预测方案。然后在整个“1F1B”管道训练过程中，每个小批都被授权执行权重预测，随后使用预测的权重执行向前传递。因此，PipeOptim 1)继承了“1F1B”调度的优点，产生了高吞吐量，2)无论使用哪种优化器，都可以确保有效的参数学习。我们使用九种不同的深度学习模型进行了广泛的实验评估，以验证我们建议的有效性。实验结果表明，PipeOptim优于其他五种流行的管道方法，包括GPipe， PipeDream, PipeDream- 2bw， SpecTrain和XPipe。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Knowledge and Data Engineering 工程技术-工程：电子与电气

CiteScore

11.70

自引率

3.40%

发文量

515

审稿时长

6 months

期刊介绍： The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.