Simulation to scaled city: zero-shot policy transfer for traffic control via autonomous vehicles

Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems Pub Date : 2018-12-14 DOI:10.1145/3302509.3313784

Kathy Jang, Logan E. Beaver, Behdad Chalaki, Ben Remer, Eugene Vinitsky, Andreas A. Malikopoulos, A. Bayen

{"title":"Simulation to scaled city: zero-shot policy transfer for traffic control via autonomous vehicles","authors":"Kathy Jang, Logan E. Beaver, Behdad Chalaki, Ben Remer, Eugene Vinitsky, Andreas A. Malikopoulos, A. Bayen","doi":"10.1145/3302509.3313784","DOIUrl":null,"url":null,"abstract":"Using deep reinforcement learning, we successfully train a set of two autonomous vehicles to lead a fleet of vehicles onto a round-about and then transfer this policy from simulation to a scaled city without fine-tuning. We use Flow, a library for deep reinforcement learning in microsimulators, to train two policies, (1) a policy with noise injected into the state and action space and (2) a policy without any injected noise. In simulation, the autonomous vehicles learn an emergent metering behavior for both policies which allows smooth merging. We then directly transfer this policy without any tuning to the University of Delaware's Scaled Smart City (UDSSC), a 1:25 scale testbed for connected and automated vehicles. We characterize the performance of the transferred policy based on how thoroughly the ramp metering behavior is captured in UDSSC. We show that the noise-free policy results in severe slowdowns and only, occasionally, it exhibits acceptable metering behavior. On the other hand, the noise-injected policy consistently performs an acceptable metering behavior, implying that the noise eventually aids with the zero-shot policy transfer. Finally, the transferred, noise-injected policy leads to a 5% reduction of average travel time and a reduction of 22% in maximum travel time in the UDSSC. Videos of the proposed self-learning controllers can be found at https://sites.google.com/view/iccps-policy-transfer.","PeriodicalId":413733,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"63","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3302509.3313784","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 63

Abstract

Using deep reinforcement learning, we successfully train a set of two autonomous vehicles to lead a fleet of vehicles onto a round-about and then transfer this policy from simulation to a scaled city without fine-tuning. We use Flow, a library for deep reinforcement learning in microsimulators, to train two policies, (1) a policy with noise injected into the state and action space and (2) a policy without any injected noise. In simulation, the autonomous vehicles learn an emergent metering behavior for both policies which allows smooth merging. We then directly transfer this policy without any tuning to the University of Delaware's Scaled Smart City (UDSSC), a 1:25 scale testbed for connected and automated vehicles. We characterize the performance of the transferred policy based on how thoroughly the ramp metering behavior is captured in UDSSC. We show that the noise-free policy results in severe slowdowns and only, occasionally, it exhibits acceptable metering behavior. On the other hand, the noise-injected policy consistently performs an acceptable metering behavior, implying that the noise eventually aids with the zero-shot policy transfer. Finally, the transferred, noise-injected policy leads to a 5% reduction of average travel time and a reduction of 22% in maximum travel time in the UDSSC. Videos of the proposed self-learning controllers can be found at https://sites.google.com/view/iccps-policy-transfer.

查看原文本刊更多论文

对规模化城市的模拟:通过自动驾驶汽车进行交通控制的零射击政策转移

使用深度强化学习，我们成功地训练了一组两辆自动驾驶汽车，将车队引导到环形交叉路口，然后在没有微调的情况下将该策略从模拟转移到规模城市。我们使用Flow，一个用于微模拟器中深度强化学习的库，来训练两个策略，(1)一个将噪声注入状态和动作空间的策略，(2)一个没有任何注入噪声的策略。在仿真中，自动驾驶汽车学习了一种紧急计量行为，以实现两种策略的顺利合并。然后，我们直接将此策略转移到特拉华大学的规模化智能城市(UDSSC)，这是一个1:25比例的联网和自动驾驶汽车测试平台。我们根据在UDSSC中捕获斜坡计量行为的彻底程度来描述传输策略的性能。我们表明，无噪声策略导致严重的减速，只是，偶尔，它表现出可接受的计量行为。另一方面，噪声注入策略始终执行可接受的计量行为，这意味着噪声最终有助于零射击策略传输。最后，在UDSSC中，传递的噪声注入策略导致平均旅行时间减少5%，最大旅行时间减少22%。所提出的自学习控制器的视频可以在https://sites.google.com/view/iccps-policy-transfer上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems

自引率

0.00%

发文量