Offline reinforcement learning strategies guided meta-heuristics for scheduling bi-objective unmanned surface vessel problems with multiple constraints

IF 8.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Swarm and Evolutionary Computation Pub Date : 2025-09-27 DOI:10.1016/j.swevo.2025.102159

Wuze Huang , Kaizhou Gao , Naiqi Wu , Liang Zhao , Renato Tinós

{"title":"Offline reinforcement learning strategies guided meta-heuristics for scheduling bi-objective unmanned surface vessel problems with multiple constraints","authors":"Wuze Huang , Kaizhou Gao , Naiqi Wu , Liang Zhao , Renato Tinós","doi":"10.1016/j.swevo.2025.102159","DOIUrl":null,"url":null,"abstract":"<div><div>This study proposes a reinforcement learning-guided meta-heuristics framework for bi-objective unmanned surface vessel (USV) scheduling problems under complex marine constraints, aiming to minimize the maximum completion time and total collision risk index, simultaneously. First, to specify the problems, a bi-objective mathematical model is developed considering three constraints, battery capacity, marine obstacles, and uncertain task executing time. Second, four meta-heuristics are used and improved to solve the focused problems. Based on the problem features, seven local search operators are designed to enhance the algorithms’ performances. Third, two state-reward strategies are designed and integrated into Q-learning and SARSA, respectively, to form four reinforcement learning (RL) algorithms. The four RL algorithms are off-line trained and employed to select the optimal local search operator during the iteration of meta-heuristics for improving the search efficiency. Finally, the study evaluates the performances of the proposed algorithms on 10 cases with different scales. The experimental results and statistical tests verify the efficiency of the local search operators. It is demonstrated that the four proposed RL algorithms can further improve algorithms’ performances. The particle swarm optimization (PSO) integrating Q-learning with the second state-reward strategy (PSO_QL2) exhibits the best competitiveness among all compared algorithms.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"99 ","pages":"Article 102159"},"PeriodicalIF":8.5000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210650225003165","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

This study proposes a reinforcement learning-guided meta-heuristics framework for bi-objective unmanned surface vessel (USV) scheduling problems under complex marine constraints, aiming to minimize the maximum completion time and total collision risk index, simultaneously. First, to specify the problems, a bi-objective mathematical model is developed considering three constraints, battery capacity, marine obstacles, and uncertain task executing time. Second, four meta-heuristics are used and improved to solve the focused problems. Based on the problem features, seven local search operators are designed to enhance the algorithms’ performances. Third, two state-reward strategies are designed and integrated into Q-learning and SARSA, respectively, to form four reinforcement learning (RL) algorithms. The four RL algorithms are off-line trained and employed to select the optimal local search operator during the iteration of meta-heuristics for improving the search efficiency. Finally, the study evaluates the performances of the proposed algorithms on 10 cases with different scales. The experimental results and statistical tests verify the efficiency of the local search operators. It is demonstrated that the four proposed RL algorithms can further improve algorithms’ performances. The particle swarm optimization (PSO) integrating Q-learning with the second state-reward strategy (PSO_QL2) exhibits the best competitiveness among all compared algorithms.

查看原文本刊更多论文

基于离线强化学习策略的多约束双目标无人水面舰艇调度元启发式算法

针对复杂海上约束条件下的双目标无人水面舰艇（USV）调度问题，提出了一种强化学习引导的元启发式框架，以同时最小化最大完工时间和总碰撞风险指数。首先，建立了考虑电池容量、海上障碍物和任务执行时间不确定三个约束条件的双目标数学模型。其次，运用并改进了四种元启发式方法来解决重点问题。根据问题的特点，设计了7个局部搜索算子来提高算法的性能。第三，设计两种状态-奖励策略，分别集成到Q-learning和SARSA中，形成四种强化学习（RL）算法。对四种强化学习算法进行离线训练，在元启发式迭代过程中选择最优局部搜索算子，提高搜索效率。最后，在10种不同尺度的情况下，对所提算法的性能进行了评价。实验结果和统计测试验证了局部搜索算子的有效性。结果表明，提出的四种强化学习算法可以进一步提高算法的性能。结合q -学习和第二状态-奖励策略的粒子群优化算法（PSO）在所有比较算法中表现出最好的竞争力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Swarm and Evolutionary Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

16.00

自引率

12.00%

发文量

169

期刊介绍： Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.