Accelerating Distributed Inference of Sparse Deep Neural Networks via Mitigating the Straggler Effect

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI:10.1109/HPEC43674.2020.9286189

M. Hasanzadeh-Mofrad, R. Melhem, Muhammad Yousuf Ahmad, Mohammad Hammoud

{"title":"Accelerating Distributed Inference of Sparse Deep Neural Networks via Mitigating the Straggler Effect","authors":"M. Hasanzadeh-Mofrad, R. Melhem, Muhammad Yousuf Ahmad, Mohammad Hammoud","doi":"10.1109/HPEC43674.2020.9286189","DOIUrl":null,"url":null,"abstract":"Once a Deep Neural Network (DNN) is trained, an inference algorithm retains the learning and applies it to batches of data. The trained DNN can be sparse because of pruning or following a preset sparse connectivity pattern. Inference in such sparse networks requires less space and time complexities compared to dense ones. Similar to dense DNNs, sparse DNNs can be parallelized using model or data parallelism, whereby the former partitions the network and the latter partitions the input among multiple threads. Model parallelism efficiently utilizes the Last Level Cache (LLC) but has a heavy synchronization cost because of compulsory reductions per layer. In contrast, data parallelism allows independent execution of partitions but suffers from a straggler effect due to a load imbalance among partitions. We combine data and model parallelisms through a new type of parallelism that we denote data-then-model. In data-then-model, each thread starts with data parallelism, thus mitigating the per-layer synchronization cost of model parallelism. After it finishes its partition, it switches to model parallelism to support a slower active thread, hence, alleviating the straggler effect of data parallelism. We compare data-then-model parallelism with data and model parallelisms as well as task-based parallelisms using the IEEE HPEC sparse DNN challenge dataset. On average, we achieve up to 10 to 65% speedup compared to these parallelisms.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC43674.2020.9286189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Once a Deep Neural Network (DNN) is trained, an inference algorithm retains the learning and applies it to batches of data. The trained DNN can be sparse because of pruning or following a preset sparse connectivity pattern. Inference in such sparse networks requires less space and time complexities compared to dense ones. Similar to dense DNNs, sparse DNNs can be parallelized using model or data parallelism, whereby the former partitions the network and the latter partitions the input among multiple threads. Model parallelism efficiently utilizes the Last Level Cache (LLC) but has a heavy synchronization cost because of compulsory reductions per layer. In contrast, data parallelism allows independent execution of partitions but suffers from a straggler effect due to a load imbalance among partitions. We combine data and model parallelisms through a new type of parallelism that we denote data-then-model. In data-then-model, each thread starts with data parallelism, thus mitigating the per-layer synchronization cost of model parallelism. After it finishes its partition, it switches to model parallelism to support a slower active thread, hence, alleviating the straggler effect of data parallelism. We compare data-then-model parallelism with data and model parallelisms as well as task-based parallelisms using the IEEE HPEC sparse DNN challenge dataset. On average, we achieve up to 10 to 65% speedup compared to these parallelisms.

查看原文本刊更多论文

利用离散效应加速稀疏深度神经网络的分布式推理

一旦深度神经网络(DNN)被训练，推理算法保留学习并将其应用于批量数据。由于修剪或遵循预设的稀疏连接模式，训练好的DNN可能是稀疏的。与密集网络相比，这种稀疏网络中的推理需要更少的空间和时间复杂性。与密集dnn类似，稀疏dnn可以使用模型并行或数据并行进行并行化，前者将网络划分为多个线程，后者将输入划分为多个线程。模型并行有效地利用了最后一级缓存(LLC)，但由于每层的强制缩减，同步成本很高。相反，数据并行性允许分区独立执行，但由于分区之间的负载不平衡，会产生离散效应。我们将数据和模型的并行性结合起来，通过一种新的并行性，即先数据后模型。在数据-然后-模型中，每个线程都以数据并行性开始，从而减轻了模型并行性的每层同步成本。在完成分区后，它切换到模型并行性，以支持较慢的活动线程，从而减轻数据并行性的散列效应。我们使用IEEE HPEC稀疏DNN挑战数据集比较了数据-模型并行性与数据和模型并行性以及基于任务的并行性。平均而言，与这些并行性相比，我们实现了高达10%到65%的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量