面向深度学习推荐模型的语义感知无损数据压缩

2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) Pub Date : 2021-11-01 DOI:10.1109/mlhpc54614.2021.00006

S. Pumma, Abhinav Vishnu

{"title":"面向深度学习推荐模型的语义感知无损数据压缩","authors":"S. Pumma, Abhinav Vishnu","doi":"10.1109/mlhpc54614.2021.00006","DOIUrl":null,"url":null,"abstract":"As the architectures and capabilities of deep neural networks evolve, they become more sophisticated to train and use. Deep Learning Recommendation Model (DLRM), a new neural network for recommendation systems, introduces challenging requirements for deep neural network training and inference. The size of the DLRM model is typically large and not able to fit on a single GPU memory. Unlike other deep neural networks, DLRM requires both model-parallel and data-parallel for the bottom part and top part of the model when running on multiple GPUs. Due to the hybrid-parallel model, the all-to-all communication is used for welding the top and bottom parts together. We have observed that the all-to-all communication is costly and is a bottleneck in the DLRM training/inference. In this paper, we propose a novel approach to reduce the communication volume by using DLRM’s properties to compress the transferred data without information loss. We demonstrate benefits of our method by training DLRM MLPerf on eight AMD Instinc$\\mathrm{t}^{\\mathrm{T}\\mathrm{M}}$ MI100 accelerators. The experimental results show 59% and 38% improvement in the time-to-solution of the DLRM MLPerf training for FP32 and mixed-precision, respectively.","PeriodicalId":101642,"journal":{"name":"2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic-Aware Lossless Data Compression for Deep Learning Recommendation Model (DLRM)\",\"authors\":\"S. Pumma, Abhinav Vishnu\",\"doi\":\"10.1109/mlhpc54614.2021.00006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the architectures and capabilities of deep neural networks evolve, they become more sophisticated to train and use. Deep Learning Recommendation Model (DLRM), a new neural network for recommendation systems, introduces challenging requirements for deep neural network training and inference. The size of the DLRM model is typically large and not able to fit on a single GPU memory. Unlike other deep neural networks, DLRM requires both model-parallel and data-parallel for the bottom part and top part of the model when running on multiple GPUs. Due to the hybrid-parallel model, the all-to-all communication is used for welding the top and bottom parts together. We have observed that the all-to-all communication is costly and is a bottleneck in the DLRM training/inference. In this paper, we propose a novel approach to reduce the communication volume by using DLRM’s properties to compress the transferred data without information loss. We demonstrate benefits of our method by training DLRM MLPerf on eight AMD Instinc$\\\\mathrm{t}^{\\\\mathrm{T}\\\\mathrm{M}}$ MI100 accelerators. The experimental results show 59% and 38% improvement in the time-to-solution of the DLRM MLPerf training for FP32 and mixed-precision, respectively.\",\"PeriodicalId\":101642,\"journal\":{\"name\":\"2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/mlhpc54614.2021.00006\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/mlhpc54614.2021.00006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着深度神经网络架构和功能的发展，它们的训练和使用变得更加复杂。深度学习推荐模型(Deep Learning Recommendation Model, DLRM)是一种用于推荐系统的新型神经网络，它对深度神经网络的训练和推理提出了具有挑战性的要求。DLRM模型的大小通常很大，不能放在单个GPU内存上。与其他深度神经网络不同，DLRM在多个gpu上运行时，对模型的底部和顶部都要求模型并行和数据并行。由于混合并联模型，采用全对全通信方式将上下两部分焊接在一起。我们已经观察到，所有对所有的通信是昂贵的，并且是DLRM训练/推理的瓶颈。在本文中，我们提出了一种利用DLRM的特性来压缩传输数据而不丢失信息的新方法。我们通过在8台AMD instc $\ mathm {t}^{\ mathm {t} \ mathm {M}}$ MI100加速器上训练DLRM MLPerf来证明我们的方法的好处。实验结果表明，在FP32和混合精度下，DLRM MLPerf训练的求解时间分别提高了59%和38%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Semantic-Aware Lossless Data Compression for Deep Learning Recommendation Model (DLRM)

As the architectures and capabilities of deep neural networks evolve, they become more sophisticated to train and use. Deep Learning Recommendation Model (DLRM), a new neural network for recommendation systems, introduces challenging requirements for deep neural network training and inference. The size of the DLRM model is typically large and not able to fit on a single GPU memory. Unlike other deep neural networks, DLRM requires both model-parallel and data-parallel for the bottom part and top part of the model when running on multiple GPUs. Due to the hybrid-parallel model, the all-to-all communication is used for welding the top and bottom parts together. We have observed that the all-to-all communication is costly and is a bottleneck in the DLRM training/inference. In this paper, we propose a novel approach to reduce the communication volume by using DLRM’s properties to compress the transferred data without information loss. We demonstrate benefits of our method by training DLRM MLPerf on eight AMD Instinc$\mathrm{t}^{\mathrm{T}\mathrm{M}}$ MI100 accelerators. The experimental results show 59% and 38% improvement in the time-to-solution of the DLRM MLPerf training for FP32 and mixed-precision, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)

自引率

0.00%

发文量