Jesse van Remmerden, Zaharah Bukhsh, Yingqian Zhang
{"title":"Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling","authors":"Jesse van Remmerden, Zaharah Bukhsh, Yingqian Zhang","doi":"arxiv-2409.10589","DOIUrl":null,"url":null,"abstract":"The Job Shop Scheduling Problem (JSSP) is a complex combinatorial\noptimization problem. There has been growing interest in using online\nReinforcement Learning (RL) for JSSP. While online RL can quickly find\nacceptable solutions, especially for larger problems, it produces lower-quality\nresults than traditional methods like Constraint Programming (CP). A\nsignificant downside of online RL is that it cannot learn from existing data,\nsuch as solutions generated from CP, requiring them to train from scratch,\nleading to sample inefficiency and making them unable to learn from more\noptimal examples. We introduce Offline Reinforcement Learning for Learning to\nDispatch (Offline-LD), a novel approach for JSSP that addresses these\nlimitations. Offline-LD adapts two CQL-based Q-learning methods (mQRDQN and\ndiscrete mSAC) for maskable action spaces, introduces a new entropy bonus\nmodification for discrete SAC, and exploits reward normalization through\npreprocessing. Our experiments show that Offline-LD outperforms online RL on\nboth generated and benchmark instances. By introducing noise into the dataset,\nwe achieve similar or better results than those obtained from the expert\ndataset, indicating that a more diverse training set is preferable because it\ncontains counterfactual information.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10589","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The Job Shop Scheduling Problem (JSSP) is a complex combinatorial
optimization problem. There has been growing interest in using online
Reinforcement Learning (RL) for JSSP. While online RL can quickly find
acceptable solutions, especially for larger problems, it produces lower-quality
results than traditional methods like Constraint Programming (CP). A
significant downside of online RL is that it cannot learn from existing data,
such as solutions generated from CP, requiring them to train from scratch,
leading to sample inefficiency and making them unable to learn from more
optimal examples. We introduce Offline Reinforcement Learning for Learning to
Dispatch (Offline-LD), a novel approach for JSSP that addresses these
limitations. Offline-LD adapts two CQL-based Q-learning methods (mQRDQN and
discrete mSAC) for maskable action spaces, introduces a new entropy bonus
modification for discrete SAC, and exploits reward normalization through
preprocessing. Our experiments show that Offline-LD outperforms online RL on
both generated and benchmark instances. By introducing noise into the dataset,
we achieve similar or better results than those obtained from the expert
dataset, indicating that a more diverse training set is preferable because it
contains counterfactual information.