Lei Guan;Dongsheng Li;Yongle Chen;Jiye Liang;Wenjian Wang;Xicheng Lu
{"title":"PipeOptim: Ensuring Effective 1F1B Schedule With Optimizer-Dependent Weight Prediction","authors":"Lei Guan;Dongsheng Li;Yongle Chen;Jiye Liang;Wenjian Wang;Xicheng Lu","doi":"10.1109/TKDE.2025.3543225","DOIUrl":null,"url":null,"abstract":"Asynchronous pipeline model parallelism with a “1F1B” (one forward, one backward) schedule generates little bubble overhead and always provides quite a high throughput. However, the “1F1B” schedule inevitably leads to weight inconsistency and weight staleness issues due to the cross-training of different mini-batches across GPUs. To simultaneously address these two problems, in this paper, we propose an optimizer-dependent weight prediction strategy (a.k.a PipeOptim) for asynchronous pipeline training. The key insight of our proposal is that we employ a weight prediction strategy in the forward pass to approximately ensure that each mini-batch uses consistent and staleness-free weights to compute the forward pass of the “1F1B” schedule. To be concrete, we first construct the weight prediction scheme based on the update rule of the used optimizer when training the deep neural network models. Then throughout the “1F1B” pipeline training, each mini-batch is mandated to execute weight prediction, subsequently employing the predicted weights to perform the forward pass. As a result, PipeOptim 1) inherits the advantage of the “1F1B” schedule and generates high throughput, and 2) can ensure effective parameter learning regardless of the type of the used optimizer. We conducted extensive experimental evaluations using nine different deep-learning models to verify the effectiveness of our proposal. The experiment results demonstrate that PipeOptim outperforms the other five popular pipeline approaches including GPipe, PipeDream, PipeDream-2BW, SpecTrain, and XPipe.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 5","pages":"2831-2845"},"PeriodicalIF":8.9000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10891748/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Asynchronous pipeline model parallelism with a “1F1B” (one forward, one backward) schedule generates little bubble overhead and always provides quite a high throughput. However, the “1F1B” schedule inevitably leads to weight inconsistency and weight staleness issues due to the cross-training of different mini-batches across GPUs. To simultaneously address these two problems, in this paper, we propose an optimizer-dependent weight prediction strategy (a.k.a PipeOptim) for asynchronous pipeline training. The key insight of our proposal is that we employ a weight prediction strategy in the forward pass to approximately ensure that each mini-batch uses consistent and staleness-free weights to compute the forward pass of the “1F1B” schedule. To be concrete, we first construct the weight prediction scheme based on the update rule of the used optimizer when training the deep neural network models. Then throughout the “1F1B” pipeline training, each mini-batch is mandated to execute weight prediction, subsequently employing the predicted weights to perform the forward pass. As a result, PipeOptim 1) inherits the advantage of the “1F1B” schedule and generates high throughput, and 2) can ensure effective parameter learning regardless of the type of the used optimizer. We conducted extensive experimental evaluations using nine different deep-learning models to verify the effectiveness of our proposal. The experiment results demonstrate that PipeOptim outperforms the other five popular pipeline approaches including GPipe, PipeDream, PipeDream-2BW, SpecTrain, and XPipe.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.