基于条件风险值的分布式强化学习训练多轨迹优化

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Soft Computing Pub Date : 2025-04-08 DOI:10.1016/j.asoc.2025.113079

Yalan Chen , Jing Xun , Shibo He , Xin Wan , Yafei Liu

{"title":"基于条件风险值的分布式强化学习训练多轨迹优化","authors":"Yalan Chen , Jing Xun , Shibo He , Xin Wan , Yafei Liu","doi":"10.1016/j.asoc.2025.113079","DOIUrl":null,"url":null,"abstract":"<div><div>Artificial intelligence methods like reinforcement learning (RL) have been widely studied to train trajectory optimization problems to achieve flexible driving. To meet the demand for flexible driving strategies in actual operations, N optimized trajectories for the single train are usually generated based on different scheduled times. It brings up two issues: the computational cost of N trajectories is N times that of a single trajectory, and manual intervention is required to adjust the initial conditions, such as schedule time. This paper proposes a conditional value-at-risk (CVaR) distributional Q-learning approach (CDQ) to generate trajectories with different driving styles, balancing safety and efficiency. First, analyzing the actual control deviations, the distribution of returns is modeled using the quantile of distributional RL. Then, we introduce CVaR as a risk metric to evaluate the risk of actions and develop risk-sensitive strategies based on various confidence levels, simultaneously optimizing multiple trajectories for the single train. Finally, we simulate the experiments with data from an actual line. The results demonstrate that the CDQ algorithm can simultaneously optimize multiple train trajectories without requiring human intervention. Through a two-layer selection mechanism, five trajectories with varying driving styles can be selected to fulfill scheduling flexibility requirements. Compared to standard Q-learning, distributional Deep Q-Network and other risk-sensitive RL, CDQ shows improved performance in both energy-saving and punctuality. The total computation time of CDQ is only 31.47% and 35.44% of Q-learning and risk-sensitive RL.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"176 ","pages":"Article 113079"},"PeriodicalIF":7.2000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-trajectory optimization for train using distributional reinforcement learning with conditional value-at-risk\",\"authors\":\"Yalan Chen , Jing Xun , Shibo He , Xin Wan , Yafei Liu\",\"doi\":\"10.1016/j.asoc.2025.113079\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Artificial intelligence methods like reinforcement learning (RL) have been widely studied to train trajectory optimization problems to achieve flexible driving. To meet the demand for flexible driving strategies in actual operations, N optimized trajectories for the single train are usually generated based on different scheduled times. It brings up two issues: the computational cost of N trajectories is N times that of a single trajectory, and manual intervention is required to adjust the initial conditions, such as schedule time. This paper proposes a conditional value-at-risk (CVaR) distributional Q-learning approach (CDQ) to generate trajectories with different driving styles, balancing safety and efficiency. First, analyzing the actual control deviations, the distribution of returns is modeled using the quantile of distributional RL. Then, we introduce CVaR as a risk metric to evaluate the risk of actions and develop risk-sensitive strategies based on various confidence levels, simultaneously optimizing multiple trajectories for the single train. Finally, we simulate the experiments with data from an actual line. The results demonstrate that the CDQ algorithm can simultaneously optimize multiple train trajectories without requiring human intervention. Through a two-layer selection mechanism, five trajectories with varying driving styles can be selected to fulfill scheduling flexibility requirements. Compared to standard Q-learning, distributional Deep Q-Network and other risk-sensitive RL, CDQ shows improved performance in both energy-saving and punctuality. The total computation time of CDQ is only 31.47% and 35.44% of Q-learning and risk-sensitive RL.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"176 \",\"pages\":\"Article 113079\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494625003904\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625003904","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

强化学习（RL）等人工智能方法已被广泛研究用于训练轨迹优化问题以实现柔性驾驶。为了满足实际运行中对灵活驾驶策略的需求，通常会根据不同的调度时间生成单列列车的N条优化轨迹。它带来了两个问题：N条轨迹的计算成本是单个轨迹的N倍，并且需要人工干预来调整初始条件，例如调度时间。本文提出了一种条件风险值（CVaR）分布q学习方法（CDQ）来生成具有不同驾驶风格、平衡安全性和效率的轨迹。首先，分析实际控制偏差，利用分布RL的分位数对收益分布进行建模。然后，我们引入CVaR作为风险度量来评估行动的风险，并制定基于不同置信度的风险敏感策略，同时优化单列列车的多条轨道。最后，用实测数据对实验进行了模拟。结果表明，CDQ算法可以在不需要人为干预的情况下同时优化多列列车的运行轨迹。通过两层选择机制，可以选择5种不同驾驶风格的轨迹来满足调度灵活性要求。与标准Q-learning、分布式Deep Q-Network和其他风险敏感RL相比，CDQ在节能和准时性方面都表现出更好的性能。CDQ的总计算时间仅为q学习和风险敏感强化学习的31.47%和35.44%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-trajectory optimization for train using distributional reinforcement learning with conditional value-at-risk

Artificial intelligence methods like reinforcement learning (RL) have been widely studied to train trajectory optimization problems to achieve flexible driving. To meet the demand for flexible driving strategies in actual operations, N optimized trajectories for the single train are usually generated based on different scheduled times. It brings up two issues: the computational cost of N trajectories is N times that of a single trajectory, and manual intervention is required to adjust the initial conditions, such as schedule time. This paper proposes a conditional value-at-risk (CVaR) distributional Q-learning approach (CDQ) to generate trajectories with different driving styles, balancing safety and efficiency. First, analyzing the actual control deviations, the distribution of returns is modeled using the quantile of distributional RL. Then, we introduce CVaR as a risk metric to evaluate the risk of actions and develop risk-sensitive strategies based on various confidence levels, simultaneously optimizing multiple trajectories for the single train. Finally, we simulate the experiments with data from an actual line. The results demonstrate that the CDQ algorithm can simultaneously optimize multiple train trajectories without requiring human intervention. Through a two-layer selection mechanism, five trajectories with varying driving styles can be selected to fulfill scheduling flexibility requirements. Compared to standard Q-learning, distributional Deep Q-Network and other risk-sensitive RL, CDQ shows improved performance in both energy-saving and punctuality. The total computation time of CDQ is only 31.47% and 35.44% of Q-learning and risk-sensitive RL.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Soft Computing 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

6.90%

发文量

874

审稿时长

10.9 months

期刊介绍： Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.