DDLBench: Towards a Scalable Benchmarking Infrastructure for Distributed Deep Learning

2020 IEEE/ACM Fourth Workshop on Deep Learning on Supercomputers (DLS) Pub Date : 2020-11-01 DOI:10.1109/DLS51937.2020.00009

Matthijs Jansen, V. Codreanu, A. Varbanescu

{"title":"DDLBench: Towards a Scalable Benchmarking Infrastructure for Distributed Deep Learning","authors":"Matthijs Jansen, V. Codreanu, A. Varbanescu","doi":"10.1109/DLS51937.2020.00009","DOIUrl":null,"url":null,"abstract":"Due to its many applications across various fields of research, engineering, and daily life, deep learning has seen a surge in popularity. Therefore, larger and more expressive models have been proposed, with examples like Turing-NLG using as many as 17 billion parameters. Training these very large models becomes increasingly difficult due to the high computational costs and large memory footprint. Therefore, several approaches for distributed training based on data parallelism (e.g., Horovod) and model/pipeline parallelism (e.g., GPipe, PipeDream) have emerged. In this work, we focus on an in-depth comparison of three different parallelism models that address these needs: data, model and pipeline parallelism. To this end, we provide an analytical comparison of the three, both in terms of computation time and memory usage, and introduce DDLBench, a comprehensive (open-source1, ready-to-use) benchmark suite to quantify these differences in practice. Through in-depth performance analysis and experimentation with various models, datasets, distribution models and hardware systems, we demonstrate that DDLBench can accurately quantify the capability of a given system to perform distributed deep learning (DDL). By comparing our analytical models with the benchmarking results, we show how the performance of real-life implementations diverges from these analytical models, thus requiring benchmarking to capture the in-depth complexity of the frameworks themselves.1https://github.com/sara-nl/DDLBench","PeriodicalId":185533,"journal":{"name":"2020 IEEE/ACM Fourth Workshop on Deep Learning on Supercomputers (DLS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/ACM Fourth Workshop on Deep Learning on Supercomputers (DLS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DLS51937.2020.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Due to its many applications across various fields of research, engineering, and daily life, deep learning has seen a surge in popularity. Therefore, larger and more expressive models have been proposed, with examples like Turing-NLG using as many as 17 billion parameters. Training these very large models becomes increasingly difficult due to the high computational costs and large memory footprint. Therefore, several approaches for distributed training based on data parallelism (e.g., Horovod) and model/pipeline parallelism (e.g., GPipe, PipeDream) have emerged. In this work, we focus on an in-depth comparison of three different parallelism models that address these needs: data, model and pipeline parallelism. To this end, we provide an analytical comparison of the three, both in terms of computation time and memory usage, and introduce DDLBench, a comprehensive (open-source1, ready-to-use) benchmark suite to quantify these differences in practice. Through in-depth performance analysis and experimentation with various models, datasets, distribution models and hardware systems, we demonstrate that DDLBench can accurately quantify the capability of a given system to perform distributed deep learning (DDL). By comparing our analytical models with the benchmarking results, we show how the performance of real-life implementations diverges from these analytical models, thus requiring benchmarking to capture the in-depth complexity of the frameworks themselves.1https://github.com/sara-nl/DDLBench

查看原文本刊更多论文

DDLBench:面向分布式深度学习的可扩展基准基础设施

由于其在各个研究、工程和日常生活领域的许多应用，深度学习的受欢迎程度激增。因此，更大、更具表现力的模型被提出，像图灵- nlg这样的例子使用了多达170亿个参数。由于高计算成本和大内存占用，训练这些非常大的模型变得越来越困难。因此，出现了几种基于数据并行性(如Horovod)和模型/管道并行性(如GPipe、PipeDream)的分布式训练方法。在这项工作中，我们将重点对三种不同的并行模型进行深入的比较，以满足这些需求:数据并行、模型并行和管道并行。为此，我们从计算时间和内存使用两方面对这三者进行了分析比较，并介绍了DDLBench，这是一个全面的(开源1，即用型)基准测试套件，可以在实践中量化这些差异。通过对各种模型、数据集、分布模型和硬件系统进行深入的性能分析和实验，我们证明了DDLBench可以准确地量化给定系统执行分布式深度学习(DDL)的能力。通过比较我们的分析模型和基准测试结果，我们展示了实际实现的性能如何偏离这些分析模型，因此需要基准测试来捕获框架本身的深度复杂性

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE/ACM Fourth Workshop on Deep Learning on Supercomputers (DLS)

自引率

0.00%

发文量