gpu上并行DNN训练:挑战与机遇

Companion Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI:10.1145/3442442.3452055

Weizheng Xu, Youtao Zhang, Xulong Tang

{"title":"gpu上并行DNN训练:挑战与机遇","authors":"Weizheng Xu, Youtao Zhang, Xulong Tang","doi":"10.1145/3442442.3452055","DOIUrl":null,"url":null,"abstract":"In recent years, Deep Neural Networks (DNNs) have emerged as a widely adopted approach in many application domains. Training DNN models is also becoming a significant fraction of the datacenter workload. Recent evidence has demonstrated that modern DNNs are becoming more complex and the size of DNN parameters (i.e., weights) is also increasing. In addition, a large amount of input data is required to train the DNN models to reach target accuracy. As a result, the training performance becomes one of the major challenges that limit DNN adoption in real-world applications. Recent works have explored different parallelism strategies (i.e., data parallelism and model parallelism) and used multi-GPUs in datacenters to accelerate the training process. However, naively adopting data parallelism and model parallelism across multiple GPUs can lead to sub-optimal executions. The major reasons are i) the large amount of data movement that prevents the system from feeding the GPUs with the required data in a timely manner (for data parallelism); and ii) low GPU utilization caused by data dependency between layers that placed on different devices (for model parallelism). In this paper, we identify the main challenges in adopting data parallelism and model parallelism on multi-GPU platforms. Then, we conduct a survey including recent research works targeting these challenges. We also provide an overview of our work-in-progress project on optimizing DNN training on GPUs. Our results demonstrate that simple-yet-effective system optimizations can further improve the training scalability compared to prior works.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Parallelizing DNN Training on GPUs: Challenges and Opportunities\",\"authors\":\"Weizheng Xu, Youtao Zhang, Xulong Tang\",\"doi\":\"10.1145/3442442.3452055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, Deep Neural Networks (DNNs) have emerged as a widely adopted approach in many application domains. Training DNN models is also becoming a significant fraction of the datacenter workload. Recent evidence has demonstrated that modern DNNs are becoming more complex and the size of DNN parameters (i.e., weights) is also increasing. In addition, a large amount of input data is required to train the DNN models to reach target accuracy. As a result, the training performance becomes one of the major challenges that limit DNN adoption in real-world applications. Recent works have explored different parallelism strategies (i.e., data parallelism and model parallelism) and used multi-GPUs in datacenters to accelerate the training process. However, naively adopting data parallelism and model parallelism across multiple GPUs can lead to sub-optimal executions. The major reasons are i) the large amount of data movement that prevents the system from feeding the GPUs with the required data in a timely manner (for data parallelism); and ii) low GPU utilization caused by data dependency between layers that placed on different devices (for model parallelism). In this paper, we identify the main challenges in adopting data parallelism and model parallelism on multi-GPU platforms. Then, we conduct a survey including recent research works targeting these challenges. We also provide an overview of our work-in-progress project on optimizing DNN training on GPUs. Our results demonstrate that simple-yet-effective system optimizations can further improve the training scalability compared to prior works.\",\"PeriodicalId\":129420,\"journal\":{\"name\":\"Companion Proceedings of the Web Conference 2021\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Companion Proceedings of the Web Conference 2021\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3442442.3452055\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Proceedings of the Web Conference 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3442442.3452055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

近年来，深度神经网络(Deep Neural Networks, dnn)作为一种被广泛采用的方法出现在许多应用领域。训练DNN模型也成为数据中心工作负载的重要组成部分。最近的证据表明，现代深度神经网络正变得越来越复杂，深度神经网络参数(即权重)的大小也在增加。此外，需要大量的输入数据来训练DNN模型以达到目标精度。因此，训练性能成为限制深度神经网络在实际应用中应用的主要挑战之一。最近的工作探索了不同的并行策略(即数据并行和模型并行)，并在数据中心使用多gpu来加速训练过程。然而，天真地跨多个gpu采用数据并行性和模型并行性可能会导致次优执行。主要原因是i)大量的数据移动阻碍了系统及时向gpu提供所需的数据(用于数据并行);ii)由于放置在不同设备上的层之间的数据依赖(用于模型并行)而导致的GPU利用率低。在本文中，我们确定了在多gpu平台上采用数据并行和模型并行的主要挑战。然后，我们进行了一项调查，包括针对这些挑战的最新研究工作。我们还概述了我们在gpu上优化DNN训练的正在进行的项目。我们的研究结果表明，与之前的工作相比，简单有效的系统优化可以进一步提高训练的可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Parallelizing DNN Training on GPUs: Challenges and Opportunities

In recent years, Deep Neural Networks (DNNs) have emerged as a widely adopted approach in many application domains. Training DNN models is also becoming a significant fraction of the datacenter workload. Recent evidence has demonstrated that modern DNNs are becoming more complex and the size of DNN parameters (i.e., weights) is also increasing. In addition, a large amount of input data is required to train the DNN models to reach target accuracy. As a result, the training performance becomes one of the major challenges that limit DNN adoption in real-world applications. Recent works have explored different parallelism strategies (i.e., data parallelism and model parallelism) and used multi-GPUs in datacenters to accelerate the training process. However, naively adopting data parallelism and model parallelism across multiple GPUs can lead to sub-optimal executions. The major reasons are i) the large amount of data movement that prevents the system from feeding the GPUs with the required data in a timely manner (for data parallelism); and ii) low GPU utilization caused by data dependency between layers that placed on different devices (for model parallelism). In this paper, we identify the main challenges in adopting data parallelism and model parallelism on multi-GPU platforms. Then, we conduct a survey including recent research works targeting these challenges. We also provide an overview of our work-in-progress project on optimizing DNN training on GPUs. Our results demonstrate that simple-yet-effective system optimizations can further improve the training scalability compared to prior works.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Companion Proceedings of the Web Conference 2021

自引率

0.00%

发文量