{"title":"自动驾驶云上的分布式深度强化学习","authors":"Mitchell Spryn, Aditya Sharma, Dhawal Parkar, Madhu Shrimal","doi":"10.1145/3194085.3194088","DOIUrl":null,"url":null,"abstract":"This paper proposes an architecture for leveraging cloud computing technology to reduce training time for deep reinforcement learning models for autonomous driving by distributing the training process across a pool of virtual machines. By parallelizing the training process, careful design of the reward function and use of techniques like transfer learning, we demonstrate a decrease in training time for our example autonomous driving problem from 140 hours to less than 1 hour. We go over our network architecture, job distribution paradigm, reward function design and report results from experiments on small sized cluster (1-6 training nodes) of machines. We also discuss the limitations of our approach when trying to scale up to massive clusters.","PeriodicalId":360022,"journal":{"name":"2018 IEEE/ACM 1st International Workshop on Software Engineering for AI in Autonomous Systems (SEFAIAS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Distributed Deep Reinforcement Learning on the Cloud for Autonomous Driving\",\"authors\":\"Mitchell Spryn, Aditya Sharma, Dhawal Parkar, Madhu Shrimal\",\"doi\":\"10.1145/3194085.3194088\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes an architecture for leveraging cloud computing technology to reduce training time for deep reinforcement learning models for autonomous driving by distributing the training process across a pool of virtual machines. By parallelizing the training process, careful design of the reward function and use of techniques like transfer learning, we demonstrate a decrease in training time for our example autonomous driving problem from 140 hours to less than 1 hour. We go over our network architecture, job distribution paradigm, reward function design and report results from experiments on small sized cluster (1-6 training nodes) of machines. We also discuss the limitations of our approach when trying to scale up to massive clusters.\",\"PeriodicalId\":360022,\"journal\":{\"name\":\"2018 IEEE/ACM 1st International Workshop on Software Engineering for AI in Autonomous Systems (SEFAIAS)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE/ACM 1st International Workshop on Software Engineering for AI in Autonomous Systems (SEFAIAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3194085.3194088\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM 1st International Workshop on Software Engineering for AI in Autonomous Systems (SEFAIAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3194085.3194088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Distributed Deep Reinforcement Learning on the Cloud for Autonomous Driving
This paper proposes an architecture for leveraging cloud computing technology to reduce training time for deep reinforcement learning models for autonomous driving by distributing the training process across a pool of virtual machines. By parallelizing the training process, careful design of the reward function and use of techniques like transfer learning, we demonstrate a decrease in training time for our example autonomous driving problem from 140 hours to less than 1 hour. We go over our network architecture, job distribution paradigm, reward function design and report results from experiments on small sized cluster (1-6 training nodes) of machines. We also discuss the limitations of our approach when trying to scale up to massive clusters.