MemFlow:内存感知分布式深度学习

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data Pub Date : 2020-06-11 DOI:10.1145/3318464.3384416

Neil Band

{"title":"MemFlow:内存感知分布式深度学习","authors":"Neil Band","doi":"10.1145/3318464.3384416","DOIUrl":null,"url":null,"abstract":"As the number of layers and the amount of training data increases, the trend is to train deep neural networks in parallel across devices. In such scenarios, neural network training is increasingly bottlenecked by high memory requirements posed by intermediate results, or feature maps, that are produced during the forward pass and consumed during the backward pass. We recognize that the best-performing device parallelization configurations should consider memory usage in addition to the canonical metric of computation time. Towards this we introduce MemFlow, an optimization framework for distributed deep learning that performs joint optimization over memory usage and computation time when searching for a parallelization strategy. MemFlow consists of: (i) a task graph with memory usage estimates; (ii) a memory-aware execution simulator; and (iii) a Markov Chain Monte Carlo search algorithm that considers various degrees of recomputation i.e., discarding feature maps during the forward pass and recomputing them during the backward pass. Our experiments demonstrate that under memory constraints, MemFlow can readily locate valid and superior parallelization strategies unattainable with previous frameworks.","PeriodicalId":436122,"journal":{"name":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","volume":"52 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"MemFlow: Memory-Aware Distributed Deep Learning\",\"authors\":\"Neil Band\",\"doi\":\"10.1145/3318464.3384416\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the number of layers and the amount of training data increases, the trend is to train deep neural networks in parallel across devices. In such scenarios, neural network training is increasingly bottlenecked by high memory requirements posed by intermediate results, or feature maps, that are produced during the forward pass and consumed during the backward pass. We recognize that the best-performing device parallelization configurations should consider memory usage in addition to the canonical metric of computation time. Towards this we introduce MemFlow, an optimization framework for distributed deep learning that performs joint optimization over memory usage and computation time when searching for a parallelization strategy. MemFlow consists of: (i) a task graph with memory usage estimates; (ii) a memory-aware execution simulator; and (iii) a Markov Chain Monte Carlo search algorithm that considers various degrees of recomputation i.e., discarding feature maps during the forward pass and recomputing them during the backward pass. Our experiments demonstrate that under memory constraints, MemFlow can readily locate valid and superior parallelization strategies unattainable with previous frameworks.\",\"PeriodicalId\":436122,\"journal\":{\"name\":\"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data\",\"volume\":\"52 6\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3318464.3384416\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3318464.3384416","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

随着层数和训练数据量的增加，趋势是跨设备并行训练深度神经网络。在这种情况下，神经网络训练越来越受到中间结果或特征图所带来的高内存要求的瓶颈，这些中间结果或特征图是在向前传递过程中产生的，并在向后传递过程中消耗。我们认识到，性能最好的设备并行化配置除了考虑计算时间的标准度量外，还应该考虑内存使用情况。为此，我们介绍了MemFlow，这是一个分布式深度学习的优化框架，它在搜索并行化策略时对内存使用和计算时间进行联合优化。MemFlow包括:(i)一个带有内存使用估计的任务图;(ii)内存感知执行模拟器;(iii)考虑不同程度重计算的马尔可夫链蒙特卡罗搜索算法，即在向前传递期间丢弃特征映射，并在向后传递期间重新计算它们。我们的实验表明，在内存约束下，MemFlow可以很容易地找到有效的和更好的并行化策略，这是以前的框架无法实现的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MemFlow: Memory-Aware Distributed Deep Learning

As the number of layers and the amount of training data increases, the trend is to train deep neural networks in parallel across devices. In such scenarios, neural network training is increasingly bottlenecked by high memory requirements posed by intermediate results, or feature maps, that are produced during the forward pass and consumed during the backward pass. We recognize that the best-performing device parallelization configurations should consider memory usage in addition to the canonical metric of computation time. Towards this we introduce MemFlow, an optimization framework for distributed deep learning that performs joint optimization over memory usage and computation time when searching for a parallelization strategy. MemFlow consists of: (i) a task graph with memory usage estimates; (ii) a memory-aware execution simulator; and (iii) a Markov Chain Monte Carlo search algorithm that considers various degrees of recomputation i.e., discarding feature maps during the forward pass and recomputing them during the backward pass. Our experiments demonstrate that under memory constraints, MemFlow can readily locate valid and superior parallelization strategies unattainable with previous frameworks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

自引率

0.00%

发文量