Li Shen, Yan Sun, Zhiyuan Yu, Liang Ding, Xinmei Tian, Dacheng Tao
{"title":"On Efficient Training of Large-Scale Deep Learning Models","authors":"Li Shen, Yan Sun, Zhiyuan Yu, Liang Ding, Xinmei Tian, Dacheng Tao","doi":"10.1145/3700439","DOIUrl":null,"url":null,"abstract":"The field of deep learning has witnessed significant progress in recent times, particularly in areas such as computer vision (CV), natural language processing (NLP), and speech. The use of large-scale models trained on vast amounts of data holds immense promise for practical applications, enhancing industrial productivity and facilitating social development. However, it extremely suffers from the unstable training process and stringent requirements of computational resources. With the increasing demands on the adaption of computational capacity, though numerous studies have explored the efficient training field to a certain extent, a comprehensive summarization/guideline on those general acceleration techniques of training large-scale deep learning models is still much anticipated. In this survey, we present a detailed review of the general techniques for training acceleration. We consider the fundamental update formulation and split its basic components into five main perspectives: (1) “data-centric”: including dataset regularization, data sampling, and data-centric curriculum learning techniques, which can significantly reduce the computational complexity of the data samples; (2) “model-centric”, including acceleration of basic modules, compression training, model initialization and model-centric curriculum learning techniques, which focus on accelerating the training via reducing the calculations on parameters and providing better initialization; (3) “optimization-centric”, including the selection of learning rate, the employment of large batchsize, the designs of efficient objectives, and model average techniques, which pay attention to the training policy and improving the generality for the large-scale models; (4) “budgeted training”, including some distinctive acceleration methods on source-constrained situations, e.g. for limitation on the total iterations; (5) “system-centric”, including some efficient distributed frameworks and open-source libraries which provide adequate hardware support for the implementation of above mentioned acceleration algorithms. By presenting this comprehensive taxonomy, our survey presents a comprehensive review to understand the general mechanisms within each component and their joint interaction. Meanwhile, we further provide a detailed analysis and discussion of future works on the development of general acceleration techniques, which could inspire us to re-think and design novel efficient paradigms. Overall, we hope that this survey will serve as a valuable guideline for general efficient training.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"9 1","pages":""},"PeriodicalIF":23.8000,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3700439","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
The field of deep learning has witnessed significant progress in recent times, particularly in areas such as computer vision (CV), natural language processing (NLP), and speech. The use of large-scale models trained on vast amounts of data holds immense promise for practical applications, enhancing industrial productivity and facilitating social development. However, it extremely suffers from the unstable training process and stringent requirements of computational resources. With the increasing demands on the adaption of computational capacity, though numerous studies have explored the efficient training field to a certain extent, a comprehensive summarization/guideline on those general acceleration techniques of training large-scale deep learning models is still much anticipated. In this survey, we present a detailed review of the general techniques for training acceleration. We consider the fundamental update formulation and split its basic components into five main perspectives: (1) “data-centric”: including dataset regularization, data sampling, and data-centric curriculum learning techniques, which can significantly reduce the computational complexity of the data samples; (2) “model-centric”, including acceleration of basic modules, compression training, model initialization and model-centric curriculum learning techniques, which focus on accelerating the training via reducing the calculations on parameters and providing better initialization; (3) “optimization-centric”, including the selection of learning rate, the employment of large batchsize, the designs of efficient objectives, and model average techniques, which pay attention to the training policy and improving the generality for the large-scale models; (4) “budgeted training”, including some distinctive acceleration methods on source-constrained situations, e.g. for limitation on the total iterations; (5) “system-centric”, including some efficient distributed frameworks and open-source libraries which provide adequate hardware support for the implementation of above mentioned acceleration algorithms. By presenting this comprehensive taxonomy, our survey presents a comprehensive review to understand the general mechanisms within each component and their joint interaction. Meanwhile, we further provide a detailed analysis and discussion of future works on the development of general acceleration techniques, which could inspire us to re-think and design novel efficient paradigms. Overall, we hope that this survey will serve as a valuable guideline for general efficient training.
期刊介绍:
ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods.
ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.