Interpretable Intersection Control by Reinforcement Learning Agent With Linear Function Approximator

IF 2.5 4区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Somporn Sahachaiseree, Takashi Oguchi
{"title":"Interpretable Intersection Control by Reinforcement Learning Agent With Linear Function Approximator","authors":"Somporn Sahachaiseree,&nbsp;Takashi Oguchi","doi":"10.1049/itr2.70034","DOIUrl":null,"url":null,"abstract":"<p>Reinforcement learning (RL) is a promising machine-learning solution to traffic signal control problems, which have been extensively studied. However, variants of non-linear, deep artificial neural network (ANN) function approximators (FAs) have been predominantly employed in previous studies proposing RL-based controllers, leaving a significant interpretability issue due to their black-box nature. In this work, the use of the linear FA for a value-based RL agent in traffic signal control problems is investigated along with the least-squares <span></span><math>\n <semantics>\n <mi>Q</mi>\n <annotation>$Q$</annotation>\n </semantics></math>-learning method, abbreviated as <span></span><math>\n <semantics>\n <mrow>\n <mi>LSTD</mi>\n <mi>Q</mi>\n </mrow>\n <annotation>${\\rm LSTD}Q$</annotation>\n </semantics></math>. The interpretable linear FA was found to be adequate for the RL agent to learn an optimal policy. This leads to the proposal to replace a non-linear ANN FA with the linear FA counterpart, resolving the interpretability issue. Moreover, the <span></span><math>\n <semantics>\n <mrow>\n <mi>LSTD</mi>\n <mi>Q</mi>\n </mrow>\n <annotation>${\\rm LSTD}Q$</annotation>\n </semantics></math> learning method shows superior behaviour convergence compared to a gradient descent method. In a low-intensity arrival pattern scenario, the control by the RL agent cuts about half of the average delay resulting from the pretimed control. Owing to the conciseness of the linear FA, a direct interpretation analysis of the converged linear-FA parameters is presented. Lastly, two online relearning tests of the agents under non-stationary arrivals are conducted to demonstrate the online performance of <span></span><math>\n <semantics>\n <mrow>\n <mi>LSTD</mi>\n <mi>Q</mi>\n </mrow>\n <annotation>${\\rm LSTD}Q$</annotation>\n </semantics></math>. In conclusion, the linear-FA specification and the <span></span><math>\n <semantics>\n <mrow>\n <mi>LSTD</mi>\n <mi>Q</mi>\n </mrow>\n <annotation>${\\rm LSTD}Q$</annotation>\n </semantics></math> method are together proposed to be used for its control algorithm interpretability property, superior convergence quality, and lack of hyperparameters.</p>","PeriodicalId":50381,"journal":{"name":"IET Intelligent Transport Systems","volume":"19 1","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/itr2.70034","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Intelligent Transport Systems","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/itr2.70034","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Reinforcement learning (RL) is a promising machine-learning solution to traffic signal control problems, which have been extensively studied. However, variants of non-linear, deep artificial neural network (ANN) function approximators (FAs) have been predominantly employed in previous studies proposing RL-based controllers, leaving a significant interpretability issue due to their black-box nature. In this work, the use of the linear FA for a value-based RL agent in traffic signal control problems is investigated along with the least-squares Q $Q$ -learning method, abbreviated as LSTD Q ${\rm LSTD}Q$ . The interpretable linear FA was found to be adequate for the RL agent to learn an optimal policy. This leads to the proposal to replace a non-linear ANN FA with the linear FA counterpart, resolving the interpretability issue. Moreover, the LSTD Q ${\rm LSTD}Q$ learning method shows superior behaviour convergence compared to a gradient descent method. In a low-intensity arrival pattern scenario, the control by the RL agent cuts about half of the average delay resulting from the pretimed control. Owing to the conciseness of the linear FA, a direct interpretation analysis of the converged linear-FA parameters is presented. Lastly, two online relearning tests of the agents under non-stationary arrivals are conducted to demonstrate the online performance of LSTD Q ${\rm LSTD}Q$ . In conclusion, the linear-FA specification and the LSTD Q ${\rm LSTD}Q$ method are together proposed to be used for its control algorithm interpretability property, superior convergence quality, and lack of hyperparameters.

Abstract Image

基于线性函数逼近的强化学习智能体的可解释交叉口控制
强化学习(RL)是解决交通信号控制问题的一种很有前途的机器学习方法,已经得到了广泛的研究。然而,非线性、深度人工神经网络(ANN)函数逼近器(FAs)的变体在先前提出基于强化学习的控制器的研究中已被主要采用,由于其黑箱性质,留下了一个重要的可解释性问题。在这项工作中,研究了基于值的RL代理在交通信号控制问题中的线性FA的使用以及最小二乘Q$ Q$学习方法,简称为LSTD Q$ {\rm LSTD}Q$。发现可解释的线性FA足以使RL代理学习最优策略。这导致了用线性FA替代非线性ANN FA的建议,解决了可解释性问题。此外,LSTD Q$ {\rm LSTD}Q$学习方法与梯度下降方法相比具有更好的行为收敛性。在低强度到达模式的情况下,RL代理的控制减少了大约一半的平均延迟,这是由提前控制造成的。由于线性FA的简洁性,给出了收敛线性FA参数的直接解释分析。最后,对非平稳到达条件下的智能体进行了两次在线再学习测试,验证了LSTD Q$ {\rm LSTD}Q$的在线性能。综上所述,线性fa规范与LSTD Q$ {\rm LSTD}Q$方法结合使用,具有控制算法可解释性好、收敛质量好、无超参数等优点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IET Intelligent Transport Systems
IET Intelligent Transport Systems 工程技术-运输科技
CiteScore
6.50
自引率
7.40%
发文量
159
审稿时长
3 months
期刊介绍: IET Intelligent Transport Systems is an interdisciplinary journal devoted to research into the practical applications of ITS and infrastructures. The scope of the journal includes the following: Sustainable traffic solutions Deployments with enabling technologies Pervasive monitoring Applications; demonstrations and evaluation Economic and behavioural analyses of ITS services and scenario Data Integration and analytics Information collection and processing; image processing applications in ITS ITS aspects of electric vehicles Autonomous vehicles; connected vehicle systems; In-vehicle ITS, safety and vulnerable road user aspects Mobility as a service systems Traffic management and control Public transport systems technologies Fleet and public transport logistics Emergency and incident management Demand management and electronic payment systems Traffic related air pollution management Policy and institutional issues Interoperability, standards and architectures Funding scenarios Enforcement Human machine interaction Education, training and outreach Current Special Issue Call for papers: Intelligent Transportation Systems in Smart Cities for Sustainable Environment - https://digital-library.theiet.org/files/IET_ITS_CFP_ITSSCSE.pdf Sustainably Intelligent Mobility (SIM) - https://digital-library.theiet.org/files/IET_ITS_CFP_SIM.pdf Traffic Theory and Modelling in the Era of Artificial Intelligence and Big Data (in collaboration with World Congress for Transport Research, WCTR 2019) - https://digital-library.theiet.org/files/IET_ITS_CFP_WCTR.pdf
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信