A Novel Inference Algorithm for Large Sparse Neural Network using Task Graph Parallelism

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI:10.1109/HPEC43674.2020.9286218

Dian-Lun Lin, Tsung-Wei Huang

{"title":"A Novel Inference Algorithm for Large Sparse Neural Network using Task Graph Parallelism","authors":"Dian-Lun Lin, Tsung-Wei Huang","doi":"10.1109/HPEC43674.2020.9286218","DOIUrl":null,"url":null,"abstract":"The ever-increasing size of modern deep neural network (DNN) architectures has put increasing strain on the hardware needed to implement them. Sparsified DNNs can greatly reduce memory costs and increase throughput over standard DNNs, if the loss of accuracy can be adequately controlled. However, sparse DNNs present unique computational challenges. Efficient model or data parallelism algorithms are extremely hard to design and implement. The recent effort MIT/IEEE/Amazon HPEC Graph Challenge has drawn attention to high-performance inference methods for large sparse DNNs. In this paper, we introduce SNIG, an efficient inference engine for large sparse DNN s. SNIG develops highly optimized inference kernels and leverages the power of CUDA Graphs to enable efficient decomposition of model and data parallelisms. Our decomposition strategy is flexible and scalable to different partitions of data volumes, model sizes, and GPU numbers. We have evaluated SNIG on the official benchmarks of HPEC Sparse DNN Challenge and demonstrated its promising performance scalable from a single GPU to multiple GPUs. Compared to the champion of the 2019 HPEC Sparse DNN Challenge, SNIG can finish all inference workloads using only a single GPU. At the largest DNN, which has more than 4 billion parameters across 1920 layers each of 65536 neurons, SNIG is up to 2.3x faster than a state-of-the-art baseline under a machine of 4 GPUs.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC43674.2020.9286218","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

The ever-increasing size of modern deep neural network (DNN) architectures has put increasing strain on the hardware needed to implement them. Sparsified DNNs can greatly reduce memory costs and increase throughput over standard DNNs, if the loss of accuracy can be adequately controlled. However, sparse DNNs present unique computational challenges. Efficient model or data parallelism algorithms are extremely hard to design and implement. The recent effort MIT/IEEE/Amazon HPEC Graph Challenge has drawn attention to high-performance inference methods for large sparse DNNs. In this paper, we introduce SNIG, an efficient inference engine for large sparse DNN s. SNIG develops highly optimized inference kernels and leverages the power of CUDA Graphs to enable efficient decomposition of model and data parallelisms. Our decomposition strategy is flexible and scalable to different partitions of data volumes, model sizes, and GPU numbers. We have evaluated SNIG on the official benchmarks of HPEC Sparse DNN Challenge and demonstrated its promising performance scalable from a single GPU to multiple GPUs. Compared to the champion of the 2019 HPEC Sparse DNN Challenge, SNIG can finish all inference workloads using only a single GPU. At the largest DNN, which has more than 4 billion parameters across 1920 layers each of 65536 neurons, SNIG is up to 2.3x faster than a state-of-the-art baseline under a machine of 4 GPUs.

查看原文本刊更多论文

一种基于任务图并行性的大型稀疏神经网络推理算法

现代深度神经网络(DNN)架构的规模不断增加，对实现它们所需的硬件造成了越来越大的压力。如果可以充分控制精度损失，稀疏dnn可以大大降低内存成本并提高吞吐量。然而，稀疏dnn提出了独特的计算挑战。高效的模型或数据并行算法很难设计和实现。最近的MIT/IEEE/Amazon HPEC图挑战引起了人们对大型稀疏dnn的高性能推理方法的关注。在本文中，我们介绍了SNIG，一个用于大型稀疏DNN的高效推理引擎。SNIG开发了高度优化的推理内核，并利用CUDA图形的功能来实现模型和数据并行性的有效分解。我们的分解策略是灵活的，可扩展到不同的数据量分区、模型大小和GPU数量。我们已经在HPEC稀疏DNN挑战赛的官方基准测试中对SNIG进行了评估，并展示了其从单个GPU到多个GPU的良好性能。与2019年HPEC稀疏DNN挑战赛的冠军相比，SNIG可以仅使用单个GPU完成所有推理工作负载。在最大的深度神经网络中，它有超过40亿个参数，跨越1920层，每层65536个神经元，SNIG比最先进的基线在4个gpu的机器下快2.3倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量