分布式强化学习的训练吞吐量分析

Sooyoung Jang, Noh-Sam Park
{"title":"分布式强化学习的训练吞吐量分析","authors":"Sooyoung Jang, Noh-Sam Park","doi":"10.1109/ICTC49870.2020.9289179","DOIUrl":null,"url":null,"abstract":"Distributed deep reinforcement learning can increase the train throughput, which is defined as the timesteps per second used for training, easily by just adding computing nodes to a cluster, which makes it an essential technique for solving complex problems. The more complicated the virtual learning environment and the policy network become, the more the CPU computing power in the rollout phase and the GPU computing power in the policy update phase is required. Recall that the reinforcement learning iterates the phases of acquiring data through rollout in the virtual learning environment and updating the policy from that data over millions of iterations. In this paper, the train throughput analysis is performed with RLlib and IMPALA on two different problems: CartPole, a simple problem, and Pong, a relatively complex problem. The effects of various scalability metrics, clustering, and observation dimensions on train throughput are analyzed. Throughout the analysis, we show that 1) the train throughput varies significantly according to the scalability metrics, 2) it is vital to monitor the bottleneck in the train throughput and configure the cluster accordingly, and 3) when the GPU computing power is the bottleneck, reducing the observation dimensions can be a great option as the train throughput increases up to 3 times by reducing the dimension from 84 to 42.","PeriodicalId":282243,"journal":{"name":"2020 International Conference on Information and Communication Technology Convergence (ICTC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Train Throughput Analysis of Distributed Reinforcement Learning\",\"authors\":\"Sooyoung Jang, Noh-Sam Park\",\"doi\":\"10.1109/ICTC49870.2020.9289179\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distributed deep reinforcement learning can increase the train throughput, which is defined as the timesteps per second used for training, easily by just adding computing nodes to a cluster, which makes it an essential technique for solving complex problems. The more complicated the virtual learning environment and the policy network become, the more the CPU computing power in the rollout phase and the GPU computing power in the policy update phase is required. Recall that the reinforcement learning iterates the phases of acquiring data through rollout in the virtual learning environment and updating the policy from that data over millions of iterations. In this paper, the train throughput analysis is performed with RLlib and IMPALA on two different problems: CartPole, a simple problem, and Pong, a relatively complex problem. The effects of various scalability metrics, clustering, and observation dimensions on train throughput are analyzed. Throughout the analysis, we show that 1) the train throughput varies significantly according to the scalability metrics, 2) it is vital to monitor the bottleneck in the train throughput and configure the cluster accordingly, and 3) when the GPU computing power is the bottleneck, reducing the observation dimensions can be a great option as the train throughput increases up to 3 times by reducing the dimension from 84 to 42.\",\"PeriodicalId\":282243,\"journal\":{\"name\":\"2020 International Conference on Information and Communication Technology Convergence (ICTC)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Information and Communication Technology Convergence (ICTC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTC49870.2020.9289179\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Information and Communication Technology Convergence (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTC49870.2020.9289179","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

分布式深度强化学习可以通过在集群中添加计算节点来轻松地提高训练吞吐量(即用于训练的每秒时间步数),这使得它成为解决复杂问题的基本技术。虚拟学习环境和策略网络越复杂,对部署阶段的CPU计算能力和策略更新阶段的GPU计算能力的要求就越高。回想一下,强化学习迭代了通过虚拟学习环境中的rollout获取数据的阶段,并在数百万次迭代中从该数据更新策略。本文使用RLlib和IMPALA对两个不同的问题进行了列车吞吐量分析:CartPole是一个简单的问题,Pong是一个相对复杂的问题。分析了各种可扩展性指标、聚类和观察维度对列车吞吐量的影响。在整个分析过程中,我们发现1)列车吞吐量根据可扩展性指标变化显著,2)监控列车吞吐量瓶颈并相应地配置集群至关重要,以及3)当GPU计算能力成为瓶颈时,减少观察维数可能是一个很好的选择,因为通过将维数从84减少到42,列车吞吐量可以增加3倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Train Throughput Analysis of Distributed Reinforcement Learning
Distributed deep reinforcement learning can increase the train throughput, which is defined as the timesteps per second used for training, easily by just adding computing nodes to a cluster, which makes it an essential technique for solving complex problems. The more complicated the virtual learning environment and the policy network become, the more the CPU computing power in the rollout phase and the GPU computing power in the policy update phase is required. Recall that the reinforcement learning iterates the phases of acquiring data through rollout in the virtual learning environment and updating the policy from that data over millions of iterations. In this paper, the train throughput analysis is performed with RLlib and IMPALA on two different problems: CartPole, a simple problem, and Pong, a relatively complex problem. The effects of various scalability metrics, clustering, and observation dimensions on train throughput are analyzed. Throughout the analysis, we show that 1) the train throughput varies significantly according to the scalability metrics, 2) it is vital to monitor the bottleneck in the train throughput and configure the cluster accordingly, and 3) when the GPU computing power is the bottleneck, reducing the observation dimensions can be a great option as the train throughput increases up to 3 times by reducing the dimension from 84 to 42.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信