Simulation to scaled city: zero-shot policy transfer for traffic control via autonomous vehicles

Kathy Jang, Logan E. Beaver, Behdad Chalaki, Ben Remer, Eugene Vinitsky, Andreas A. Malikopoulos, A. Bayen
{"title":"Simulation to scaled city: zero-shot policy transfer for traffic control via autonomous vehicles","authors":"Kathy Jang, Logan E. Beaver, Behdad Chalaki, Ben Remer, Eugene Vinitsky, Andreas A. Malikopoulos, A. Bayen","doi":"10.1145/3302509.3313784","DOIUrl":null,"url":null,"abstract":"Using deep reinforcement learning, we successfully train a set of two autonomous vehicles to lead a fleet of vehicles onto a round-about and then transfer this policy from simulation to a scaled city without fine-tuning. We use Flow, a library for deep reinforcement learning in microsimulators, to train two policies, (1) a policy with noise injected into the state and action space and (2) a policy without any injected noise. In simulation, the autonomous vehicles learn an emergent metering behavior for both policies which allows smooth merging. We then directly transfer this policy without any tuning to the University of Delaware's Scaled Smart City (UDSSC), a 1:25 scale testbed for connected and automated vehicles. We characterize the performance of the transferred policy based on how thoroughly the ramp metering behavior is captured in UDSSC. We show that the noise-free policy results in severe slowdowns and only, occasionally, it exhibits acceptable metering behavior. On the other hand, the noise-injected policy consistently performs an acceptable metering behavior, implying that the noise eventually aids with the zero-shot policy transfer. Finally, the transferred, noise-injected policy leads to a 5% reduction of average travel time and a reduction of 22% in maximum travel time in the UDSSC. Videos of the proposed self-learning controllers can be found at https://sites.google.com/view/iccps-policy-transfer.","PeriodicalId":413733,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"63","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3302509.3313784","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 63

Abstract

Using deep reinforcement learning, we successfully train a set of two autonomous vehicles to lead a fleet of vehicles onto a round-about and then transfer this policy from simulation to a scaled city without fine-tuning. We use Flow, a library for deep reinforcement learning in microsimulators, to train two policies, (1) a policy with noise injected into the state and action space and (2) a policy without any injected noise. In simulation, the autonomous vehicles learn an emergent metering behavior for both policies which allows smooth merging. We then directly transfer this policy without any tuning to the University of Delaware's Scaled Smart City (UDSSC), a 1:25 scale testbed for connected and automated vehicles. We characterize the performance of the transferred policy based on how thoroughly the ramp metering behavior is captured in UDSSC. We show that the noise-free policy results in severe slowdowns and only, occasionally, it exhibits acceptable metering behavior. On the other hand, the noise-injected policy consistently performs an acceptable metering behavior, implying that the noise eventually aids with the zero-shot policy transfer. Finally, the transferred, noise-injected policy leads to a 5% reduction of average travel time and a reduction of 22% in maximum travel time in the UDSSC. Videos of the proposed self-learning controllers can be found at https://sites.google.com/view/iccps-policy-transfer.
对规模化城市的模拟:通过自动驾驶汽车进行交通控制的零射击政策转移
使用深度强化学习,我们成功地训练了一组两辆自动驾驶汽车,将车队引导到环形交叉路口,然后在没有微调的情况下将该策略从模拟转移到规模城市。我们使用Flow,一个用于微模拟器中深度强化学习的库,来训练两个策略,(1)一个将噪声注入状态和动作空间的策略,(2)一个没有任何注入噪声的策略。在仿真中,自动驾驶汽车学习了一种紧急计量行为,以实现两种策略的顺利合并。然后,我们直接将此策略转移到特拉华大学的规模化智能城市(UDSSC),这是一个1:25比例的联网和自动驾驶汽车测试平台。我们根据在UDSSC中捕获斜坡计量行为的彻底程度来描述传输策略的性能。我们表明,无噪声策略导致严重的减速,只是,偶尔,它表现出可接受的计量行为。另一方面,噪声注入策略始终执行可接受的计量行为,这意味着噪声最终有助于零射击策略传输。最后,在UDSSC中,传递的噪声注入策略导致平均旅行时间减少5%,最大旅行时间减少22%。所提出的自学习控制器的视频可以在https://sites.google.com/view/iccps-policy-transfer上找到。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信