基于条件风险值的分布式强化学习训练多轨迹优化

IF 7.2 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yalan Chen , Jing Xun , Shibo He , Xin Wan , Yafei Liu
{"title":"基于条件风险值的分布式强化学习训练多轨迹优化","authors":"Yalan Chen ,&nbsp;Jing Xun ,&nbsp;Shibo He ,&nbsp;Xin Wan ,&nbsp;Yafei Liu","doi":"10.1016/j.asoc.2025.113079","DOIUrl":null,"url":null,"abstract":"<div><div>Artificial intelligence methods like reinforcement learning (RL) have been widely studied to train trajectory optimization problems to achieve flexible driving. To meet the demand for flexible driving strategies in actual operations, N optimized trajectories for the single train are usually generated based on different scheduled times. It brings up two issues: the computational cost of N trajectories is N times that of a single trajectory, and manual intervention is required to adjust the initial conditions, such as schedule time. This paper proposes a conditional value-at-risk (CVaR) distributional Q-learning approach (CDQ) to generate trajectories with different driving styles, balancing safety and efficiency. First, analyzing the actual control deviations, the distribution of returns is modeled using the quantile of distributional RL. Then, we introduce CVaR as a risk metric to evaluate the risk of actions and develop risk-sensitive strategies based on various confidence levels, simultaneously optimizing multiple trajectories for the single train. Finally, we simulate the experiments with data from an actual line. The results demonstrate that the CDQ algorithm can simultaneously optimize multiple train trajectories without requiring human intervention. Through a two-layer selection mechanism, five trajectories with varying driving styles can be selected to fulfill scheduling flexibility requirements. Compared to standard Q-learning, distributional Deep Q-Network and other risk-sensitive RL, CDQ shows improved performance in both energy-saving and punctuality. The total computation time of CDQ is only 31.47% and 35.44% of Q-learning and risk-sensitive RL.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"176 ","pages":"Article 113079"},"PeriodicalIF":7.2000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-trajectory optimization for train using distributional reinforcement learning with conditional value-at-risk\",\"authors\":\"Yalan Chen ,&nbsp;Jing Xun ,&nbsp;Shibo He ,&nbsp;Xin Wan ,&nbsp;Yafei Liu\",\"doi\":\"10.1016/j.asoc.2025.113079\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Artificial intelligence methods like reinforcement learning (RL) have been widely studied to train trajectory optimization problems to achieve flexible driving. To meet the demand for flexible driving strategies in actual operations, N optimized trajectories for the single train are usually generated based on different scheduled times. It brings up two issues: the computational cost of N trajectories is N times that of a single trajectory, and manual intervention is required to adjust the initial conditions, such as schedule time. This paper proposes a conditional value-at-risk (CVaR) distributional Q-learning approach (CDQ) to generate trajectories with different driving styles, balancing safety and efficiency. First, analyzing the actual control deviations, the distribution of returns is modeled using the quantile of distributional RL. Then, we introduce CVaR as a risk metric to evaluate the risk of actions and develop risk-sensitive strategies based on various confidence levels, simultaneously optimizing multiple trajectories for the single train. Finally, we simulate the experiments with data from an actual line. The results demonstrate that the CDQ algorithm can simultaneously optimize multiple train trajectories without requiring human intervention. Through a two-layer selection mechanism, five trajectories with varying driving styles can be selected to fulfill scheduling flexibility requirements. Compared to standard Q-learning, distributional Deep Q-Network and other risk-sensitive RL, CDQ shows improved performance in both energy-saving and punctuality. The total computation time of CDQ is only 31.47% and 35.44% of Q-learning and risk-sensitive RL.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"176 \",\"pages\":\"Article 113079\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494625003904\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625003904","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

强化学习(RL)等人工智能方法已被广泛研究用于训练轨迹优化问题以实现柔性驾驶。为了满足实际运行中对灵活驾驶策略的需求,通常会根据不同的调度时间生成单列列车的N条优化轨迹。它带来了两个问题:N条轨迹的计算成本是单个轨迹的N倍,并且需要人工干预来调整初始条件,例如调度时间。本文提出了一种条件风险值(CVaR)分布q学习方法(CDQ)来生成具有不同驾驶风格、平衡安全性和效率的轨迹。首先,分析实际控制偏差,利用分布RL的分位数对收益分布进行建模。然后,我们引入CVaR作为风险度量来评估行动的风险,并制定基于不同置信度的风险敏感策略,同时优化单列列车的多条轨道。最后,用实测数据对实验进行了模拟。结果表明,CDQ算法可以在不需要人为干预的情况下同时优化多列列车的运行轨迹。通过两层选择机制,可以选择5种不同驾驶风格的轨迹来满足调度灵活性要求。与标准Q-learning、分布式Deep Q-Network和其他风险敏感RL相比,CDQ在节能和准时性方面都表现出更好的性能。CDQ的总计算时间仅为q学习和风险敏感强化学习的31.47%和35.44%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-trajectory optimization for train using distributional reinforcement learning with conditional value-at-risk
Artificial intelligence methods like reinforcement learning (RL) have been widely studied to train trajectory optimization problems to achieve flexible driving. To meet the demand for flexible driving strategies in actual operations, N optimized trajectories for the single train are usually generated based on different scheduled times. It brings up two issues: the computational cost of N trajectories is N times that of a single trajectory, and manual intervention is required to adjust the initial conditions, such as schedule time. This paper proposes a conditional value-at-risk (CVaR) distributional Q-learning approach (CDQ) to generate trajectories with different driving styles, balancing safety and efficiency. First, analyzing the actual control deviations, the distribution of returns is modeled using the quantile of distributional RL. Then, we introduce CVaR as a risk metric to evaluate the risk of actions and develop risk-sensitive strategies based on various confidence levels, simultaneously optimizing multiple trajectories for the single train. Finally, we simulate the experiments with data from an actual line. The results demonstrate that the CDQ algorithm can simultaneously optimize multiple train trajectories without requiring human intervention. Through a two-layer selection mechanism, five trajectories with varying driving styles can be selected to fulfill scheduling flexibility requirements. Compared to standard Q-learning, distributional Deep Q-Network and other risk-sensitive RL, CDQ shows improved performance in both energy-saving and punctuality. The total computation time of CDQ is only 31.47% and 35.44% of Q-learning and risk-sensitive RL.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Soft Computing
Applied Soft Computing 工程技术-计算机:跨学科应用
CiteScore
15.80
自引率
6.90%
发文量
874
审稿时长
10.9 months
期刊介绍: Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信