On the integration of reinforcement learning and simulated annealing for the parallel batch scheduling problem with setups

IF 6 2区管理学 Q1 OPERATIONS RESEARCH & MANAGEMENT SCIENCE

European Journal of Operational Research Pub Date : 2025-05-02 DOI:10.1016/j.ejor.2025.04.042

Gustavo Alencar Rolim, Caio Paziani Tomazella, Marcelo Seido Nagano

{"title":"On the integration of reinforcement learning and simulated annealing for the parallel batch scheduling problem with setups","authors":"Gustavo Alencar Rolim, Caio Paziani Tomazella, Marcelo Seido Nagano","doi":"10.1016/j.ejor.2025.04.042","DOIUrl":null,"url":null,"abstract":"Motivated by semiconductor applications, where wafer lots are grouped into families and processed on batch machines, this paper addresses a generalized unrelated parallel-batch scheduling problem. The goal is to minimize total completion time (flow time) while considering family- and machine-dependent setup times. We propose a mixed-integer programming formulation, establish a necessary condition for optimal schedules, and develop a polynomial-time heuristic for batching and sequencing. We also evaluate Q-Learning, a model-free reinforcement learning algorithm, for neighborhood selection within two Simulated Annealing-based metaheuristics: Stochastic Local Search (SLS) and Adaptive Large Neighborhood Search (ALNS). Results show that SLS and ALNS achieve better solutions and faster convergence compared to existing approaches. Finally, we conclude that while Q-Learning has the potential to improve solution quality in certain cases, it also increases the complexity of the algorithms, making them harder to configure and scale.","PeriodicalId":55161,"journal":{"name":"European Journal of Operational Research","volume":"55 1","pages":""},"PeriodicalIF":6.0000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Operational Research","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1016/j.ejor.2025.04.042","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Motivated by semiconductor applications, where wafer lots are grouped into families and processed on batch machines, this paper addresses a generalized unrelated parallel-batch scheduling problem. The goal is to minimize total completion time (flow time) while considering family- and machine-dependent setup times. We propose a mixed-integer programming formulation, establish a necessary condition for optimal schedules, and develop a polynomial-time heuristic for batching and sequencing. We also evaluate Q-Learning, a model-free reinforcement learning algorithm, for neighborhood selection within two Simulated Annealing-based metaheuristics: Stochastic Local Search (SLS) and Adaptive Large Neighborhood Search (ALNS). Results show that SLS and ALNS achieve better solutions and faster convergence compared to existing approaches. Finally, we conclude that while Q-Learning has the potential to improve solution quality in certain cases, it also increases the complexity of the algorithms, making them harder to configure and scale.

查看原文本刊更多论文

带设置并行批调度问题的强化学习与模拟退火集成研究

在半导体应用中，晶圆批次被分组成族并在批处理机器上进行处理，本文提出了一个广义的不相关并行批调度问题。目标是最小化总完井时间（流程时间），同时考虑与系列和机器相关的安装时间。我们提出了一个混合整数规划公式，建立了最优调度的必要条件，并开发了一个批处理和排序的多项式时间启发式算法。我们还评估了Q-Learning，一种无模型强化学习算法，在两种基于模拟退火的元启发式方法中进行邻域选择：随机局部搜索（SLS）和自适应大邻域搜索（ALNS）。结果表明，与现有方法相比，SLS和ALNS具有更好的解和更快的收敛速度。最后，我们得出结论，虽然Q-Learning在某些情况下有可能提高解决方案的质量，但它也增加了算法的复杂性，使它们更难配置和扩展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Journal of Operational Research 管理科学-运筹学与管理科学

CiteScore

11.90

自引率

9.40%

发文量

786

审稿时长

8.2 months

期刊介绍： The European Journal of Operational Research (EJOR) publishes high quality, original papers that contribute to the methodology of operational research (OR) and to the practice of decision making.