Kathy Jang, Logan E. Beaver, Behdad Chalaki, Ben Remer, Eugene Vinitsky, Andreas A. Malikopoulos, A. Bayen
{"title":"对规模化城市的模拟:通过自动驾驶汽车进行交通控制的零射击政策转移","authors":"Kathy Jang, Logan E. Beaver, Behdad Chalaki, Ben Remer, Eugene Vinitsky, Andreas A. Malikopoulos, A. Bayen","doi":"10.1145/3302509.3313784","DOIUrl":null,"url":null,"abstract":"Using deep reinforcement learning, we successfully train a set of two autonomous vehicles to lead a fleet of vehicles onto a round-about and then transfer this policy from simulation to a scaled city without fine-tuning. We use Flow, a library for deep reinforcement learning in microsimulators, to train two policies, (1) a policy with noise injected into the state and action space and (2) a policy without any injected noise. In simulation, the autonomous vehicles learn an emergent metering behavior for both policies which allows smooth merging. We then directly transfer this policy without any tuning to the University of Delaware's Scaled Smart City (UDSSC), a 1:25 scale testbed for connected and automated vehicles. We characterize the performance of the transferred policy based on how thoroughly the ramp metering behavior is captured in UDSSC. We show that the noise-free policy results in severe slowdowns and only, occasionally, it exhibits acceptable metering behavior. On the other hand, the noise-injected policy consistently performs an acceptable metering behavior, implying that the noise eventually aids with the zero-shot policy transfer. Finally, the transferred, noise-injected policy leads to a 5% reduction of average travel time and a reduction of 22% in maximum travel time in the UDSSC. Videos of the proposed self-learning controllers can be found at https://sites.google.com/view/iccps-policy-transfer.","PeriodicalId":413733,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"63","resultStr":"{\"title\":\"Simulation to scaled city: zero-shot policy transfer for traffic control via autonomous vehicles\",\"authors\":\"Kathy Jang, Logan E. Beaver, Behdad Chalaki, Ben Remer, Eugene Vinitsky, Andreas A. Malikopoulos, A. Bayen\",\"doi\":\"10.1145/3302509.3313784\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Using deep reinforcement learning, we successfully train a set of two autonomous vehicles to lead a fleet of vehicles onto a round-about and then transfer this policy from simulation to a scaled city without fine-tuning. We use Flow, a library for deep reinforcement learning in microsimulators, to train two policies, (1) a policy with noise injected into the state and action space and (2) a policy without any injected noise. In simulation, the autonomous vehicles learn an emergent metering behavior for both policies which allows smooth merging. We then directly transfer this policy without any tuning to the University of Delaware's Scaled Smart City (UDSSC), a 1:25 scale testbed for connected and automated vehicles. We characterize the performance of the transferred policy based on how thoroughly the ramp metering behavior is captured in UDSSC. We show that the noise-free policy results in severe slowdowns and only, occasionally, it exhibits acceptable metering behavior. On the other hand, the noise-injected policy consistently performs an acceptable metering behavior, implying that the noise eventually aids with the zero-shot policy transfer. Finally, the transferred, noise-injected policy leads to a 5% reduction of average travel time and a reduction of 22% in maximum travel time in the UDSSC. Videos of the proposed self-learning controllers can be found at https://sites.google.com/view/iccps-policy-transfer.\",\"PeriodicalId\":413733,\"journal\":{\"name\":\"Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"63\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3302509.3313784\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3302509.3313784","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Simulation to scaled city: zero-shot policy transfer for traffic control via autonomous vehicles
Using deep reinforcement learning, we successfully train a set of two autonomous vehicles to lead a fleet of vehicles onto a round-about and then transfer this policy from simulation to a scaled city without fine-tuning. We use Flow, a library for deep reinforcement learning in microsimulators, to train two policies, (1) a policy with noise injected into the state and action space and (2) a policy without any injected noise. In simulation, the autonomous vehicles learn an emergent metering behavior for both policies which allows smooth merging. We then directly transfer this policy without any tuning to the University of Delaware's Scaled Smart City (UDSSC), a 1:25 scale testbed for connected and automated vehicles. We characterize the performance of the transferred policy based on how thoroughly the ramp metering behavior is captured in UDSSC. We show that the noise-free policy results in severe slowdowns and only, occasionally, it exhibits acceptable metering behavior. On the other hand, the noise-injected policy consistently performs an acceptable metering behavior, implying that the noise eventually aids with the zero-shot policy transfer. Finally, the transferred, noise-injected policy leads to a 5% reduction of average travel time and a reduction of 22% in maximum travel time in the UDSSC. Videos of the proposed self-learning controllers can be found at https://sites.google.com/view/iccps-policy-transfer.