Studying the Effects of Hashing of Sparse Deep Neural Networks on Data and Model Parallelisms

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI:10.1109/HPEC43674.2020.9286195

M. Hasanzadeh-Mofrad, R. Melhem, Muhammad Yousuf Ahmad, Mohammad Hammoud

{"title":"Studying the Effects of Hashing of Sparse Deep Neural Networks on Data and Model Parallelisms","authors":"M. Hasanzadeh-Mofrad, R. Melhem, Muhammad Yousuf Ahmad, Mohammad Hammoud","doi":"10.1109/HPEC43674.2020.9286195","DOIUrl":null,"url":null,"abstract":"Deep Neural Network (DNN) training and inference are two resource-intensive tasks that are usually scaled out using data or model parallelism where data parallelism parallelizes over the input data and model parallelism parallelizes over the network. Also, dense matrix-matrix multiplication is the key primitive behind training/inference of dense DNNs. On the contrary, sparse DNNs are less resource-intensive compared to their dense counterparts while offering comparable accuracy. Similarly, they can be parallelized using data or model parallelism with Sparse Matrix-Matrix Multiplication (SpMM) as the key primitive. To scale out, both data and model parallelisms initially use data parallelism to partition the input data among multiple machines. This initial partitioning of the input makes data and model parallelisms performance prone to load imbalance as partitions may be imbalanced. As part of this paper, we take a deeper look into data and model parallelisms and closely study the mechanics of the SpMM used for each. Moreover, to intuitively remedy their load imbalance problem, we incorporate hashing as a simple yet powerful method to address load imabalance. Finally, we use the IEEE HPEC sparse DNN challenge dataset to evaluate the performance of data and model parallelisms at scale. We scaled up to 32 machines (896 cores) and inferred a large sparse DNN with 4B parameters in 51 seconds. Results suggest that with hashing, data and model parallelisms achieve super-linear speedup due to better load balance and cache utilization.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC43674.2020.9286195","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Deep Neural Network (DNN) training and inference are two resource-intensive tasks that are usually scaled out using data or model parallelism where data parallelism parallelizes over the input data and model parallelism parallelizes over the network. Also, dense matrix-matrix multiplication is the key primitive behind training/inference of dense DNNs. On the contrary, sparse DNNs are less resource-intensive compared to their dense counterparts while offering comparable accuracy. Similarly, they can be parallelized using data or model parallelism with Sparse Matrix-Matrix Multiplication (SpMM) as the key primitive. To scale out, both data and model parallelisms initially use data parallelism to partition the input data among multiple machines. This initial partitioning of the input makes data and model parallelisms performance prone to load imbalance as partitions may be imbalanced. As part of this paper, we take a deeper look into data and model parallelisms and closely study the mechanics of the SpMM used for each. Moreover, to intuitively remedy their load imbalance problem, we incorporate hashing as a simple yet powerful method to address load imabalance. Finally, we use the IEEE HPEC sparse DNN challenge dataset to evaluate the performance of data and model parallelisms at scale. We scaled up to 32 machines (896 cores) and inferred a large sparse DNN with 4B parameters in 51 seconds. Results suggest that with hashing, data and model parallelisms achieve super-linear speedup due to better load balance and cache utilization.

查看原文本刊更多论文

研究稀疏深度神经网络哈希对数据和模型并行性的影响

深度神经网络(DNN)的训练和推理是两个资源密集型任务，通常使用数据或模型并行进行扩展，其中数据并行化在输入数据上并行化，模型并行化在网络上并行化。此外，密集矩阵-矩阵乘法是密集dnn训练/推理背后的关键原语。相反，与密集dnn相比，稀疏dnn的资源密集程度更低，同时提供了相当的准确性。类似地，它们可以使用以稀疏矩阵-矩阵乘法(SpMM)作为关键原语的数据或模型并行性来并行化。为了向外扩展，数据并行性和模型并行性最初都使用数据并行性在多台机器之间划分输入数据。输入的这种初始分区使得数据和模型并行性能容易出现负载不平衡，因为分区可能不平衡。作为本文的一部分，我们将深入研究数据和模型并行性，并仔细研究用于每种并行性的SpMM的机制。此外，为了直观地纠正它们的负载不平衡问题，我们将哈希作为一种简单而强大的方法来解决负载不平衡问题。最后，我们使用IEEE HPEC稀疏DNN挑战数据集来评估大规模数据和模型并行性的性能。我们扩展到32台机器(896个内核)，并在51秒内推断出具有4B个参数的大型稀疏DNN。结果表明，使用哈希，由于更好的负载平衡和缓存利用率，数据和模型并行性实现了超线性加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量