基于平均代价的交叉口交通灯自适应控制的强化学习

2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC) Pub Date : 2011-11-18 DOI:10.1109/ITSC.2011.6082823

A. PrashanthL., S. Bhatnagar

{"title":"基于平均代价的交叉口交通灯自适应控制的强化学习","authors":"A. PrashanthL., S. Bhatnagar","doi":"10.1109/ITSC.2011.6082823","DOIUrl":null,"url":null,"abstract":"We propose for the first time two reinforcement learning algorithms with function approximation for average cost adaptive control of traffic lights. One of these algorithms is a version of Q-learning with function approximation while the other is a policy gradient actor-critic algorithm that incorporates multi-timescale stochastic approximation. We show performance comparisons on various network settings of these algorithms with a range of fixed timing algorithms, as well as a Q-learning algorithm with full state representation that we also implement. We observe that whereas (as expected) on a two-junction corridor, the full state representation algorithm shows the best results, this algorithm is not implementable on larger road networks. The algorithm PG-AC-TLC that we propose is seen to show the best overall performance.","PeriodicalId":186596,"journal":{"name":"2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"64","resultStr":"{\"title\":\"Reinforcement learning with average cost for adaptive control of traffic lights at intersections\",\"authors\":\"A. PrashanthL., S. Bhatnagar\",\"doi\":\"10.1109/ITSC.2011.6082823\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose for the first time two reinforcement learning algorithms with function approximation for average cost adaptive control of traffic lights. One of these algorithms is a version of Q-learning with function approximation while the other is a policy gradient actor-critic algorithm that incorporates multi-timescale stochastic approximation. We show performance comparisons on various network settings of these algorithms with a range of fixed timing algorithms, as well as a Q-learning algorithm with full state representation that we also implement. We observe that whereas (as expected) on a two-junction corridor, the full state representation algorithm shows the best results, this algorithm is not implementable on larger road networks. The algorithm PG-AC-TLC that we propose is seen to show the best overall performance.\",\"PeriodicalId\":186596,\"journal\":{\"name\":\"2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"64\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITSC.2011.6082823\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITSC.2011.6082823","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 64

摘要

首次提出了两种基于函数逼近的强化学习算法，用于交通信号灯的平均成本自适应控制。其中一种算法是带有函数逼近的Q-learning版本，而另一种算法是包含多时间尺度随机逼近的策略梯度角色批评算法。我们展示了这些算法在各种网络设置上与一系列固定定时算法的性能比较，以及我们也实现的具有全状态表示的q -学习算法。我们观察到，尽管(如预期的)在双路口走廊上，全状态表示算法显示出最好的结果，但该算法不适用于更大的道路网络。我们提出的PG-AC-TLC算法显示出最佳的综合性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Reinforcement learning with average cost for adaptive control of traffic lights at intersections

We propose for the first time two reinforcement learning algorithms with function approximation for average cost adaptive control of traffic lights. One of these algorithms is a version of Q-learning with function approximation while the other is a policy gradient actor-critic algorithm that incorporates multi-timescale stochastic approximation. We show performance comparisons on various network settings of these algorithms with a range of fixed timing algorithms, as well as a Q-learning algorithm with full state representation that we also implement. We observe that whereas (as expected) on a two-junction corridor, the full state representation algorithm shows the best results, this algorithm is not implementable on larger road networks. The algorithm PG-AC-TLC that we propose is seen to show the best overall performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC)

自引率

0.00%

发文量