Model-Parallel Model Selection for Deep Learning Systems

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI:10.1145/3448016.3450571

Kabir Nagrecha

引用次数: 12

Abstract

As deep learning becomes more expensive, both in terms of time and compute, inefficiencies in machine learning training prevent practical usage of state-of-the-art models for most users. The newest model architectures are simply too large to be fit onto a single processor. To address the issue, many ML practitioners have turned to model parallelism as a method of distributing the computational requirements across several devices. Unfortunately, the sequential nature of neural networks causes very low efficiency and device utilization in model parallel training jobs. We propose a new form of "shard parallelism" combining task parallelism and model parallelism, and package it into a framework we name Hydra. Hydra recasts the problem of model parallelism in the multi-model context to produce a fine-grained parallel workload of independent model shards, rather than independent models. This new parallel design promises dramatic speedups relative to the traditional model parallelism paradigm.

查看原文本刊更多论文

深度学习系统的模型-并行模型选择

随着深度学习在时间和计算方面变得越来越昂贵，机器学习训练的低效率阻碍了大多数用户对最先进模型的实际使用。最新的模型架构太大了，无法容纳在单个处理器上。为了解决这个问题，许多ML从业者已经转向模型并行，作为在多个设备上分配计算需求的方法。不幸的是，神经网络的顺序性导致模型并行训练工作的效率和设备利用率非常低。我们提出了一种新的“分片并行”形式，结合了任务并行和模型并行，并将其打包到一个名为Hydra的框架中。Hydra在多模型上下文中重新定义了模型并行问题，以产生独立模型碎片的细粒度并行工作负载，而不是独立模型。与传统的模型并行范式相比，这种新的并行设计保证了显著的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2021 International Conference on Management of Data

自引率

0.00%

发文量