Offline reinforcement learning for learning to dispatch for job shop scheduling.

IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Machine Learning Pub Date : 2025-01-01 Epub Date: 2025-07-15 DOI:10.1007/s10994-025-06826-w
Jesse van Remmerden, Zaharah Bukhsh, Yingqian Zhang
{"title":"Offline reinforcement learning for learning to dispatch for job shop scheduling.","authors":"Jesse van Remmerden, Zaharah Bukhsh, Yingqian Zhang","doi":"10.1007/s10994-025-06826-w","DOIUrl":null,"url":null,"abstract":"<p><p>The Job Shop Scheduling Problem (JSSP) is a complex combinatorial optimization problem. While online Reinforcement Learning (RL) has shown promise by quickly finding acceptable solutions for JSSP, it faces key limitations: it requires extensive training interactions from scratch leading to sample inefficiency, cannot leverage existing high-quality solutions from traditional methods like Constraint Programming (CP), and require simulated environments to train in, which are impracticable to build for complex scheduling environments. We introduce Offline Learned Dispatching (Offline-LD), an offline reinforcement learning approach for JSSP, which addresses these limitations by learning from historical scheduling data. Our approach is motivated by scenarios where historical scheduling data and expert solutions are available or scenarios where online training of RL approaches with simulated environments is impracticable. Offline-LD introduces maskable variants of two Q-learning methods, namely, Maskable Quantile Regression DQN (mQRDQN) and discrete maskable Soft Actor-Critic (d-mSAC), that are able to learn from historical data, through Conservative Q-Learning (CQL), whereby we present a novel entropy bonus modification for d-mSAC, for maskable action spaces. Moreover, we introduce a novel reward normalization method for JSSP in an offline RL setting. Our experiments demonstrate that Offline-LD outperforms online RL on both generated and benchmark instances when trained on only 100 solutions generated by CP. Notably, introducing noise to the expert dataset yields comparable or superior results to using the expert dataset, with the same amount of instances, a promising finding for real-world applications, where data is inherently noisy and imperfect.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"114 8","pages":"191"},"PeriodicalIF":2.9000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12263752/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-025-06826-w","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/15 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The Job Shop Scheduling Problem (JSSP) is a complex combinatorial optimization problem. While online Reinforcement Learning (RL) has shown promise by quickly finding acceptable solutions for JSSP, it faces key limitations: it requires extensive training interactions from scratch leading to sample inefficiency, cannot leverage existing high-quality solutions from traditional methods like Constraint Programming (CP), and require simulated environments to train in, which are impracticable to build for complex scheduling environments. We introduce Offline Learned Dispatching (Offline-LD), an offline reinforcement learning approach for JSSP, which addresses these limitations by learning from historical scheduling data. Our approach is motivated by scenarios where historical scheduling data and expert solutions are available or scenarios where online training of RL approaches with simulated environments is impracticable. Offline-LD introduces maskable variants of two Q-learning methods, namely, Maskable Quantile Regression DQN (mQRDQN) and discrete maskable Soft Actor-Critic (d-mSAC), that are able to learn from historical data, through Conservative Q-Learning (CQL), whereby we present a novel entropy bonus modification for d-mSAC, for maskable action spaces. Moreover, we introduce a novel reward normalization method for JSSP in an offline RL setting. Our experiments demonstrate that Offline-LD outperforms online RL on both generated and benchmark instances when trained on only 100 solutions generated by CP. Notably, introducing noise to the expert dataset yields comparable or superior results to using the expert dataset, with the same amount of instances, a promising finding for real-world applications, where data is inherently noisy and imperfect.

Abstract Image

Abstract Image

Abstract Image

离线强化学习学习调度作业车间调度。
作业车间调度问题(JSSP)是一个复杂的组合优化问题。虽然在线强化学习(RL)通过快速找到可接受的JSSP解决方案显示出了希望,但它面临着关键的局限性:它需要从头开始进行大量的训练交互,导致样本效率低下,不能利用约束规划(CP)等传统方法的现有高质量解决方案,并且需要模拟环境进行训练,这对于复杂的调度环境来说是不切实际的。我们介绍了离线学习调度(Offline- ld),这是一种针对JSSP的离线强化学习方法,它通过从历史调度数据中学习来解决这些限制。我们的方法是由历史调度数据和专家解决方案可用的场景或在模拟环境下RL方法的在线训练是不可行的场景驱动的。离线- ld引入了两种q -学习方法的可屏蔽变体,即可屏蔽分位数回归DQN (mQRDQN)和离散可屏蔽软Actor-Critic (d-mSAC),它们能够通过保守q -学习(CQL)从历史数据中学习,其中我们提出了一种新的熵奖励修改d-mSAC,用于可屏蔽的动作空间。此外,我们还在离线强化学习环境下引入了一种新的JSSP奖励归一化方法。我们的实验表明,当仅在CP生成的100个解决方案上进行训练时,离线- ld在生成和基准实例上的表现都优于在线RL。值得注意的是,在相同数量的实例下,向专家数据集引入噪声会产生与使用专家数据集相当或更好的结果,这对于数据本身具有噪声和不完美的现实应用来说是一个有希望的发现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Machine Learning
Machine Learning 工程技术-计算机:人工智能
CiteScore
11.00
自引率
2.70%
发文量
162
审稿时长
3 months
期刊介绍: Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信