基于平均代价的交叉口交通灯自适应控制的强化学习

A. PrashanthL., S. Bhatnagar
{"title":"基于平均代价的交叉口交通灯自适应控制的强化学习","authors":"A. PrashanthL., S. Bhatnagar","doi":"10.1109/ITSC.2011.6082823","DOIUrl":null,"url":null,"abstract":"We propose for the first time two reinforcement learning algorithms with function approximation for average cost adaptive control of traffic lights. One of these algorithms is a version of Q-learning with function approximation while the other is a policy gradient actor-critic algorithm that incorporates multi-timescale stochastic approximation. We show performance comparisons on various network settings of these algorithms with a range of fixed timing algorithms, as well as a Q-learning algorithm with full state representation that we also implement. We observe that whereas (as expected) on a two-junction corridor, the full state representation algorithm shows the best results, this algorithm is not implementable on larger road networks. The algorithm PG-AC-TLC that we propose is seen to show the best overall performance.","PeriodicalId":186596,"journal":{"name":"2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"64","resultStr":"{\"title\":\"Reinforcement learning with average cost for adaptive control of traffic lights at intersections\",\"authors\":\"A. PrashanthL., S. Bhatnagar\",\"doi\":\"10.1109/ITSC.2011.6082823\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose for the first time two reinforcement learning algorithms with function approximation for average cost adaptive control of traffic lights. One of these algorithms is a version of Q-learning with function approximation while the other is a policy gradient actor-critic algorithm that incorporates multi-timescale stochastic approximation. We show performance comparisons on various network settings of these algorithms with a range of fixed timing algorithms, as well as a Q-learning algorithm with full state representation that we also implement. We observe that whereas (as expected) on a two-junction corridor, the full state representation algorithm shows the best results, this algorithm is not implementable on larger road networks. The algorithm PG-AC-TLC that we propose is seen to show the best overall performance.\",\"PeriodicalId\":186596,\"journal\":{\"name\":\"2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"64\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITSC.2011.6082823\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITSC.2011.6082823","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 64

摘要

首次提出了两种基于函数逼近的强化学习算法,用于交通信号灯的平均成本自适应控制。其中一种算法是带有函数逼近的Q-learning版本,而另一种算法是包含多时间尺度随机逼近的策略梯度角色批评算法。我们展示了这些算法在各种网络设置上与一系列固定定时算法的性能比较,以及我们也实现的具有全状态表示的q -学习算法。我们观察到,尽管(如预期的)在双路口走廊上,全状态表示算法显示出最好的结果,但该算法不适用于更大的道路网络。我们提出的PG-AC-TLC算法显示出最佳的综合性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Reinforcement learning with average cost for adaptive control of traffic lights at intersections
We propose for the first time two reinforcement learning algorithms with function approximation for average cost adaptive control of traffic lights. One of these algorithms is a version of Q-learning with function approximation while the other is a policy gradient actor-critic algorithm that incorporates multi-timescale stochastic approximation. We show performance comparisons on various network settings of these algorithms with a range of fixed timing algorithms, as well as a Q-learning algorithm with full state representation that we also implement. We observe that whereas (as expected) on a two-junction corridor, the full state representation algorithm shows the best results, this algorithm is not implementable on larger road networks. The algorithm PG-AC-TLC that we propose is seen to show the best overall performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信