基于错误有界有损压缩的新型高效记忆深度学习训练框架

Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2020-11-18 DOI:10.1145/3437801.3441597

Sian Jin, Guanpeng Li, S. Song, Dingwen Tao

{"title":"基于错误有界有损压缩的新型高效记忆深度学习训练框架","authors":"Sian Jin, Guanpeng Li, S. Song, Dingwen Tao","doi":"10.1145/3437801.3441597","DOIUrl":null,"url":null,"abstract":"DNNs are becoming increasingly deeper, wider, and nonlinear due to the growing demands on prediction accuracy and analysis quality. When training a DNN model, the intermediate activation data must be saved in the memory during forward propagation and then restored for backward propagation. Traditional memory saving techniques such as data recomputation and migration either suffers from a high performance overhead or is constrained by specific interconnect technology and limited bandwidth. In this paper, we propose a novel memory-driven high performance CNN training framework that leverages error-bounded lossy compression to significantly reduce the memory requirement for training in order to allow training larger neural networks. Specifically, we provide theoretical analysis and then propose an improved lossy compressor and an adaptive scheme to dynamically configure the lossy compression error-bound and adjust the training batch size to further utilize the saved memory space for additional speedup. We evaluate our design against state-of-the-art solutions with four widely-adopted CNNs and the ImangeNet dataset. Results demonstrate that our proposed framework can significantly reduce the training memory consumption by up to 13.5× and 1.8× over the baseline training and state-of-the-art framework with compression, respectively, with little or no accuracy loss. The full paper can be referred to at https://arxiv.org/abs/2011.09017.","PeriodicalId":124852,"journal":{"name":"Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"A novel memory-efficient deep learning training framework via error-bounded lossy compression\",\"authors\":\"Sian Jin, Guanpeng Li, S. Song, Dingwen Tao\",\"doi\":\"10.1145/3437801.3441597\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"DNNs are becoming increasingly deeper, wider, and nonlinear due to the growing demands on prediction accuracy and analysis quality. When training a DNN model, the intermediate activation data must be saved in the memory during forward propagation and then restored for backward propagation. Traditional memory saving techniques such as data recomputation and migration either suffers from a high performance overhead or is constrained by specific interconnect technology and limited bandwidth. In this paper, we propose a novel memory-driven high performance CNN training framework that leverages error-bounded lossy compression to significantly reduce the memory requirement for training in order to allow training larger neural networks. Specifically, we provide theoretical analysis and then propose an improved lossy compressor and an adaptive scheme to dynamically configure the lossy compression error-bound and adjust the training batch size to further utilize the saved memory space for additional speedup. We evaluate our design against state-of-the-art solutions with four widely-adopted CNNs and the ImangeNet dataset. Results demonstrate that our proposed framework can significantly reduce the training memory consumption by up to 13.5× and 1.8× over the baseline training and state-of-the-art framework with compression, respectively, with little or no accuracy loss. The full paper can be referred to at https://arxiv.org/abs/2011.09017.\",\"PeriodicalId\":124852,\"journal\":{\"name\":\"Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3437801.3441597\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3437801.3441597","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

由于对预测精度和分析质量的要求越来越高，深度神经网络正变得越来越深、越来越宽、越来越非线性。在训练DNN模型时，前向传播时必须将中间激活数据保存在内存中，后向传播时必须恢复中间激活数据。传统的内存节省技术，如数据重计算和迁移，要么受到高性能开销的困扰，要么受到特定互连技术和有限带宽的限制。在本文中，我们提出了一种新的内存驱动的高性能CNN训练框架，该框架利用错误有界有损压缩来显着降低训练的内存需求，从而允许训练更大的神经网络。具体来说，我们在进行理论分析的基础上，提出了一种改进的有损压缩器，并提出了一种动态配置有损压缩错误边界和调整训练批大小的自适应方案，以进一步利用节省的内存空间来获得额外的加速。我们使用四个广泛采用的cnn和ImangeNet数据集来评估我们的设计与最先进的解决方案。结果表明，与基线训练和最先进的压缩框架相比，我们提出的框架可以显著减少训练内存消耗，分别减少13.5倍和1.8倍，并且几乎没有准确性损失。全文可在https://arxiv.org/abs/2011.09017上查阅。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A novel memory-efficient deep learning training framework via error-bounded lossy compression

DNNs are becoming increasingly deeper, wider, and nonlinear due to the growing demands on prediction accuracy and analysis quality. When training a DNN model, the intermediate activation data must be saved in the memory during forward propagation and then restored for backward propagation. Traditional memory saving techniques such as data recomputation and migration either suffers from a high performance overhead or is constrained by specific interconnect technology and limited bandwidth. In this paper, we propose a novel memory-driven high performance CNN training framework that leverages error-bounded lossy compression to significantly reduce the memory requirement for training in order to allow training larger neural networks. Specifically, we provide theoretical analysis and then propose an improved lossy compressor and an adaptive scheme to dynamically configure the lossy compression error-bound and adjust the training batch size to further utilize the saved memory space for additional speedup. We evaluate our design against state-of-the-art solutions with four widely-adopted CNNs and the ImangeNet dataset. Results demonstrate that our proposed framework can significantly reduce the training memory consumption by up to 13.5× and 1.8× over the baseline training and state-of-the-art framework with compression, respectively, with little or no accuracy loss. The full paper can be referred to at https://arxiv.org/abs/2011.09017.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

自引率

0.00%

发文量