{"title":"Distributed Deep Reinforcement Learning on the Cloud for Autonomous Driving","authors":"Mitchell Spryn, Aditya Sharma, Dhawal Parkar, Madhu Shrimal","doi":"10.1145/3194085.3194088","DOIUrl":null,"url":null,"abstract":"This paper proposes an architecture for leveraging cloud computing technology to reduce training time for deep reinforcement learning models for autonomous driving by distributing the training process across a pool of virtual machines. By parallelizing the training process, careful design of the reward function and use of techniques like transfer learning, we demonstrate a decrease in training time for our example autonomous driving problem from 140 hours to less than 1 hour. We go over our network architecture, job distribution paradigm, reward function design and report results from experiments on small sized cluster (1-6 training nodes) of machines. We also discuss the limitations of our approach when trying to scale up to massive clusters.","PeriodicalId":360022,"journal":{"name":"2018 IEEE/ACM 1st International Workshop on Software Engineering for AI in Autonomous Systems (SEFAIAS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM 1st International Workshop on Software Engineering for AI in Autonomous Systems (SEFAIAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3194085.3194088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
This paper proposes an architecture for leveraging cloud computing technology to reduce training time for deep reinforcement learning models for autonomous driving by distributing the training process across a pool of virtual machines. By parallelizing the training process, careful design of the reward function and use of techniques like transfer learning, we demonstrate a decrease in training time for our example autonomous driving problem from 140 hours to less than 1 hour. We go over our network architecture, job distribution paradigm, reward function design and report results from experiments on small sized cluster (1-6 training nodes) of machines. We also discuss the limitations of our approach when trying to scale up to massive clusters.