On Efficient Training of Large-Scale Deep Learning Models

IF 23.8 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

ACM Computing Surveys Pub Date : 2024-10-11 DOI:10.1145/3700439

Li Shen, Yan Sun, Zhiyuan Yu, Liang Ding, Xinmei Tian, Dacheng Tao

{"title":"On Efficient Training of Large-Scale Deep Learning Models","authors":"Li Shen, Yan Sun, Zhiyuan Yu, Liang Ding, Xinmei Tian, Dacheng Tao","doi":"10.1145/3700439","DOIUrl":null,"url":null,"abstract":"The field of deep learning has witnessed significant progress in recent times, particularly in areas such as computer vision (CV), natural language processing (NLP), and speech. The use of large-scale models trained on vast amounts of data holds immense promise for practical applications, enhancing industrial productivity and facilitating social development. However, it extremely suffers from the unstable training process and stringent requirements of computational resources. With the increasing demands on the adaption of computational capacity, though numerous studies have explored the efficient training field to a certain extent, a comprehensive summarization/guideline on those general acceleration techniques of training large-scale deep learning models is still much anticipated. In this survey, we present a detailed review of the general techniques for training acceleration. We consider the fundamental update formulation and split its basic components into five main perspectives: (1) “data-centric”: including dataset regularization, data sampling, and data-centric curriculum learning techniques, which can significantly reduce the computational complexity of the data samples; (2) “model-centric”, including acceleration of basic modules, compression training, model initialization and model-centric curriculum learning techniques, which focus on accelerating the training via reducing the calculations on parameters and providing better initialization; (3) “optimization-centric”, including the selection of learning rate, the employment of large batchsize, the designs of efficient objectives, and model average techniques, which pay attention to the training policy and improving the generality for the large-scale models; (4) “budgeted training”, including some distinctive acceleration methods on source-constrained situations, e.g. for limitation on the total iterations; (5) “system-centric”, including some efficient distributed frameworks and open-source libraries which provide adequate hardware support for the implementation of above mentioned acceleration algorithms. By presenting this comprehensive taxonomy, our survey presents a comprehensive review to understand the general mechanisms within each component and their joint interaction. Meanwhile, we further provide a detailed analysis and discussion of future works on the development of general acceleration techniques, which could inspire us to re-think and design novel efficient paradigms. Overall, we hope that this survey will serve as a valuable guideline for general efficient training.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":null,"pages":null},"PeriodicalIF":23.8000,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3700439","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

The field of deep learning has witnessed significant progress in recent times, particularly in areas such as computer vision (CV), natural language processing (NLP), and speech. The use of large-scale models trained on vast amounts of data holds immense promise for practical applications, enhancing industrial productivity and facilitating social development. However, it extremely suffers from the unstable training process and stringent requirements of computational resources. With the increasing demands on the adaption of computational capacity, though numerous studies have explored the efficient training field to a certain extent, a comprehensive summarization/guideline on those general acceleration techniques of training large-scale deep learning models is still much anticipated. In this survey, we present a detailed review of the general techniques for training acceleration. We consider the fundamental update formulation and split its basic components into five main perspectives: (1) “data-centric”: including dataset regularization, data sampling, and data-centric curriculum learning techniques, which can significantly reduce the computational complexity of the data samples; (2) “model-centric”, including acceleration of basic modules, compression training, model initialization and model-centric curriculum learning techniques, which focus on accelerating the training via reducing the calculations on parameters and providing better initialization; (3) “optimization-centric”, including the selection of learning rate, the employment of large batchsize, the designs of efficient objectives, and model average techniques, which pay attention to the training policy and improving the generality for the large-scale models; (4) “budgeted training”, including some distinctive acceleration methods on source-constrained situations, e.g. for limitation on the total iterations; (5) “system-centric”, including some efficient distributed frameworks and open-source libraries which provide adequate hardware support for the implementation of above mentioned acceleration algorithms. By presenting this comprehensive taxonomy, our survey presents a comprehensive review to understand the general mechanisms within each component and their joint interaction. Meanwhile, we further provide a detailed analysis and discussion of future works on the development of general acceleration techniques, which could inspire us to re-think and design novel efficient paradigms. Overall, we hope that this survey will serve as a valuable guideline for general efficient training.

查看原文本刊更多论文

论大规模深度学习模型的高效训练

近来，深度学习领域取得了重大进展，尤其是在计算机视觉（CV）、自然语言处理（NLP）和语音等领域。使用在海量数据上训练出来的大规模模型，在实际应用、提高工业生产力和促进社会发展方面前景广阔。然而，其训练过程的不稳定性和对计算资源的苛刻要求使其受到极大困扰。随着对计算能力适应性要求的不断提高，虽然众多研究在高效训练领域进行了一定程度的探索，但对训练大规模深度学习模型的通用加速技术的全面总结/指导仍备受期待。在这份调查报告中，我们对训练加速的一般技术进行了详细评述。我们考虑了基本的更新表述，并将其基本组成部分分为五个主要视角：（1）"以数据为中心"：包括数据集正则化、数据采样和以数据为中心的课程学习技术，这些技术可以显著降低数据样本的计算复杂度；（2）"以模型为中心"：包括基本模块加速、压缩训练、模型初始化和以模型为中心的课程学习技术，这些技术的重点是通过减少参数计算和提供更好的初始化来加速训练；(3) "以优化为中心"，包括学习率的选择、大批量的使用、高效目标的设计和模型平均技术，这些技术注重训练策略和提高大规模模型的通用性；(4) "有预算的训练"，包括在源受限情况下的一些独特的加速方法，如限制总迭代次数的方法；(5) "以优化为中心"，包括学习率的选择、大批量的使用、高效目标的设计和模型平均技术，这些技术注重训练策略和提高大规模模型的通用性。(5) "以系统为中心"，包括一些高效的分布式框架和开源库，它们为上述加速算法的实施提供了充分的硬件支持。通过提出这一全面的分类法，我们的调查报告提供了一个全面的回顾，以了解每个组成部分的一般机制以及它们之间的联合互动。同时，我们还对未来通用加速技术的开发工作进行了详细分析和讨论，这可以启发我们重新思考和设计新的高效范例。总之，我们希望本调查报告能为通用高效训练提供有价值的指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Computing Surveys 工程技术-计算机：理论方法

CiteScore

33.20

自引率

0.60%

发文量

372

审稿时长

12 months

期刊介绍： ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods. ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.