{"title":"分布式强化学习的训练吞吐量分析","authors":"Sooyoung Jang, Noh-Sam Park","doi":"10.1109/ICTC49870.2020.9289179","DOIUrl":null,"url":null,"abstract":"Distributed deep reinforcement learning can increase the train throughput, which is defined as the timesteps per second used for training, easily by just adding computing nodes to a cluster, which makes it an essential technique for solving complex problems. The more complicated the virtual learning environment and the policy network become, the more the CPU computing power in the rollout phase and the GPU computing power in the policy update phase is required. Recall that the reinforcement learning iterates the phases of acquiring data through rollout in the virtual learning environment and updating the policy from that data over millions of iterations. In this paper, the train throughput analysis is performed with RLlib and IMPALA on two different problems: CartPole, a simple problem, and Pong, a relatively complex problem. The effects of various scalability metrics, clustering, and observation dimensions on train throughput are analyzed. Throughout the analysis, we show that 1) the train throughput varies significantly according to the scalability metrics, 2) it is vital to monitor the bottleneck in the train throughput and configure the cluster accordingly, and 3) when the GPU computing power is the bottleneck, reducing the observation dimensions can be a great option as the train throughput increases up to 3 times by reducing the dimension from 84 to 42.","PeriodicalId":282243,"journal":{"name":"2020 International Conference on Information and Communication Technology Convergence (ICTC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Train Throughput Analysis of Distributed Reinforcement Learning\",\"authors\":\"Sooyoung Jang, Noh-Sam Park\",\"doi\":\"10.1109/ICTC49870.2020.9289179\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distributed deep reinforcement learning can increase the train throughput, which is defined as the timesteps per second used for training, easily by just adding computing nodes to a cluster, which makes it an essential technique for solving complex problems. The more complicated the virtual learning environment and the policy network become, the more the CPU computing power in the rollout phase and the GPU computing power in the policy update phase is required. Recall that the reinforcement learning iterates the phases of acquiring data through rollout in the virtual learning environment and updating the policy from that data over millions of iterations. In this paper, the train throughput analysis is performed with RLlib and IMPALA on two different problems: CartPole, a simple problem, and Pong, a relatively complex problem. The effects of various scalability metrics, clustering, and observation dimensions on train throughput are analyzed. Throughout the analysis, we show that 1) the train throughput varies significantly according to the scalability metrics, 2) it is vital to monitor the bottleneck in the train throughput and configure the cluster accordingly, and 3) when the GPU computing power is the bottleneck, reducing the observation dimensions can be a great option as the train throughput increases up to 3 times by reducing the dimension from 84 to 42.\",\"PeriodicalId\":282243,\"journal\":{\"name\":\"2020 International Conference on Information and Communication Technology Convergence (ICTC)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Information and Communication Technology Convergence (ICTC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTC49870.2020.9289179\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Information and Communication Technology Convergence (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTC49870.2020.9289179","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Train Throughput Analysis of Distributed Reinforcement Learning
Distributed deep reinforcement learning can increase the train throughput, which is defined as the timesteps per second used for training, easily by just adding computing nodes to a cluster, which makes it an essential technique for solving complex problems. The more complicated the virtual learning environment and the policy network become, the more the CPU computing power in the rollout phase and the GPU computing power in the policy update phase is required. Recall that the reinforcement learning iterates the phases of acquiring data through rollout in the virtual learning environment and updating the policy from that data over millions of iterations. In this paper, the train throughput analysis is performed with RLlib and IMPALA on two different problems: CartPole, a simple problem, and Pong, a relatively complex problem. The effects of various scalability metrics, clustering, and observation dimensions on train throughput are analyzed. Throughout the analysis, we show that 1) the train throughput varies significantly according to the scalability metrics, 2) it is vital to monitor the bottleneck in the train throughput and configure the cluster accordingly, and 3) when the GPU computing power is the bottleneck, reducing the observation dimensions can be a great option as the train throughput increases up to 3 times by reducing the dimension from 84 to 42.