{"title":"FreezePipe: An Efficient Dynamic Pipeline Parallel Approach Based on Freezing Mechanism for Distributed DNN Training","authors":"Caishan Weng, Zhiyang Shu, Zhengjia Xu, Jinghui Zhang, Junzhou Luo, Fang Dong, Peng Wang, Zhengang Wang","doi":"10.1109/CSCWD57460.2023.10152643","DOIUrl":null,"url":null,"abstract":"Deep Neural Network (DNN) training on a large scale is extremely time-consuming and computationally intensive, which is accelerated by distributed training. In recent years, pipeline parallelism has been developed, which enables partitioning the model across several devices, e.g. GPU, and training efficiency is improved by dividing data batches into micro-batches, with each of them processed by a different stage of the model. Currently, parallel training assumes pipeline placement and partitioning are static, with parameters updating each iteration, without accounting for freezing. This results in computational resources not being fully utilized. In this paper, we propose FreezePipe, a novel method for optimizing deep learning training that combines the freezing mechanism with pipeline parallel training. In FreezePipe, a lightweight method for determining the freezing strategy based on gradient changes is employed. Considering that resources need to be released based on the frozen layer, a lightweight model partitioning algorithm was designed to determine the optimal strategy for pipeline partitioning. Experimental results show that FreezePipe can reduce the training time by 64.5% compared to Torchgpipe on CIFAR-10 dataset without compromising any model performance.","PeriodicalId":51008,"journal":{"name":"Computer Supported Cooperative Work-The Journal of Collaborative Computing","volume":"10 1","pages":"303-308"},"PeriodicalIF":2.0000,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Supported Cooperative Work-The Journal of Collaborative Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/CSCWD57460.2023.10152643","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Deep Neural Network (DNN) training on a large scale is extremely time-consuming and computationally intensive, which is accelerated by distributed training. In recent years, pipeline parallelism has been developed, which enables partitioning the model across several devices, e.g. GPU, and training efficiency is improved by dividing data batches into micro-batches, with each of them processed by a different stage of the model. Currently, parallel training assumes pipeline placement and partitioning are static, with parameters updating each iteration, without accounting for freezing. This results in computational resources not being fully utilized. In this paper, we propose FreezePipe, a novel method for optimizing deep learning training that combines the freezing mechanism with pipeline parallel training. In FreezePipe, a lightweight method for determining the freezing strategy based on gradient changes is employed. Considering that resources need to be released based on the frozen layer, a lightweight model partitioning algorithm was designed to determine the optimal strategy for pipeline partitioning. Experimental results show that FreezePipe can reduce the training time by 64.5% compared to Torchgpipe on CIFAR-10 dataset without compromising any model performance.
期刊介绍:
Computer Supported Cooperative Work (CSCW): The Journal of Collaborative Computing and Work Practices is devoted to innovative research in computer-supported cooperative work (CSCW). It provides an interdisciplinary and international forum for the debate and exchange of ideas concerning theoretical, practical, technical, and social issues in CSCW.
The CSCW Journal arose in response to the growing interest in the design, implementation and use of technical systems (including computing, information, and communications technologies) which support people working cooperatively, and its scope remains to encompass the multifarious aspects of research within CSCW and related areas.
The CSCW Journal focuses on research oriented towards the development of collaborative computing technologies on the basis of studies of actual cooperative work practices (where ‘work’ is used in the wider sense). That is, it welcomes in particular submissions that (a) report on findings from ethnographic or similar kinds of in-depth fieldwork of work practices with a view to their technological implications, (b) report on empirical evaluations of the use of extant or novel technical solutions under real-world conditions, and/or (c) develop technical or conceptual frameworks for practice-oriented computing research based on previous fieldwork and evaluations.