Partitioning sparse deep neural networks for scalable training and inference

ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing Pub Date : 2021-04-23 DOI:10.1145/3447818.3460372

G. Demirci, H. Ferhatosmanoğlu

{"title":"Partitioning sparse deep neural networks for scalable training and inference","authors":"G. Demirci, H. Ferhatosmanoğlu","doi":"10.1145/3447818.3460372","DOIUrl":null,"url":null,"abstract":"The state-of-the-art deep neural networks (DNNs) have significant computational and data management requirements. The size of both training data and models continue to increase. Sparsification and pruning methods are shown to be effective in removing a large fraction of connections in DNNs. The resulting sparse networks present unique challenges to further improve the computational efficiency of training and inference in deep learning. Both the feedforward (inference) and backpropagation steps in stochastic gradient descent (SGD) algorithm for training sparse DNNs involve consecutive sparse matrix-vector multiplications (SpMVs). We first introduce a distributed-memory parallel SpMV-based solution for the SGD algorithm to improve its scalability. The parallelization approach is based on row-wise partitioning of weight matrices that represent neuron connections between consecutive layers. We then propose a novel hypergraph model for partitioning weight matrices to reduce the total communication volume and ensure computational load-balance among processors. Experiments performed on sparse DNNs demonstrate that the proposed solution is highly efficient and scalable. By utilizing the proposed matrix partitioning scheme, the performance of our solution is further improved significantly.","PeriodicalId":73273,"journal":{"name":"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3447818.3460372","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

The state-of-the-art deep neural networks (DNNs) have significant computational and data management requirements. The size of both training data and models continue to increase. Sparsification and pruning methods are shown to be effective in removing a large fraction of connections in DNNs. The resulting sparse networks present unique challenges to further improve the computational efficiency of training and inference in deep learning. Both the feedforward (inference) and backpropagation steps in stochastic gradient descent (SGD) algorithm for training sparse DNNs involve consecutive sparse matrix-vector multiplications (SpMVs). We first introduce a distributed-memory parallel SpMV-based solution for the SGD algorithm to improve its scalability. The parallelization approach is based on row-wise partitioning of weight matrices that represent neuron connections between consecutive layers. We then propose a novel hypergraph model for partitioning weight matrices to reduce the total communication volume and ensure computational load-balance among processors. Experiments performed on sparse DNNs demonstrate that the proposed solution is highly efficient and scalable. By utilizing the proposed matrix partitioning scheme, the performance of our solution is further improved significantly.

查看原文本刊更多论文

用于可扩展训练和推理的稀疏深度神经网络分区

最先进的深度神经网络(dnn)具有显著的计算和数据管理要求。训练数据和模型的规模都在不断增加。稀疏化和修剪方法可以有效地去除dnn中的大部分连接。由此产生的稀疏网络对进一步提高深度学习中训练和推理的计算效率提出了独特的挑战。训练稀疏dnn的随机梯度下降(SGD)算法的前馈(推理)和反向传播步骤都涉及连续稀疏矩阵向量乘法(spmv)。我们首先为SGD算法引入了一个基于分布式内存并行spmv的解决方案，以提高其可伸缩性。并行化方法是基于表示连续层之间神经元连接的权重矩阵的逐行划分。然后，我们提出了一种新的超图模型来划分权重矩阵，以减少总通信量并确保处理器之间的计算负载平衡。在稀疏dnn上进行的实验表明，该方法具有较高的效率和可扩展性。利用所提出的矩阵划分方案，我们的解决方案的性能得到了进一步的显著提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing

自引率

0.00%

发文量