Reinforcement Learning as an Improvement Heuristic for Real-World Production Scheduling

Arthur Müller, Lukas Vollenkemper
{"title":"Reinforcement Learning as an Improvement Heuristic for Real-World Production Scheduling","authors":"Arthur Müller, Lukas Vollenkemper","doi":"arxiv-2409.11933","DOIUrl":null,"url":null,"abstract":"The integration of Reinforcement Learning (RL) with heuristic methods is an\nemerging trend for solving optimization problems, which leverages RL's ability\nto learn from the data generated during the search process. One promising\napproach is to train an RL agent as an improvement heuristic, starting with a\nsuboptimal solution that is iteratively improved by applying small changes. We\napply this approach to a real-world multiobjective production scheduling\nproblem. Our approach utilizes a network architecture that includes Transformer\nencoding to learn the relationships between jobs. Afterwards, a probability\nmatrix is generated from which pairs of jobs are sampled and then swapped to\nimprove the solution. We benchmarked our approach against other heuristics\nusing real data from our industry partner, demonstrating its superior\nperformance.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11933","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The integration of Reinforcement Learning (RL) with heuristic methods is an emerging trend for solving optimization problems, which leverages RL's ability to learn from the data generated during the search process. One promising approach is to train an RL agent as an improvement heuristic, starting with a suboptimal solution that is iteratively improved by applying small changes. We apply this approach to a real-world multiobjective production scheduling problem. Our approach utilizes a network architecture that includes Transformer encoding to learn the relationships between jobs. Afterwards, a probability matrix is generated from which pairs of jobs are sampled and then swapped to improve the solution. We benchmarked our approach against other heuristics using real data from our industry partner, demonstrating its superior performance.
强化学习作为改进现实世界生产调度的启发式方法
强化学习(RL)与启发式方法的结合是解决优化问题的一个新兴趋势,它充分利用了 RL 从搜索过程中产生的数据中学习的能力。一种很有前景的方法是将 RL 代理作为改进启发式方法来训练,从次优解开始,通过应用微小的变化进行迭代改进。我们将这种方法应用于现实世界中的多目标生产调度问题。我们的方法采用了一种网络架构,其中包括转换编码(Transformerencoding)来学习工作之间的关系。之后,生成一个概率矩阵,从中抽取成对的工作,然后进行交换,以改进解决方案。我们利用行业合作伙伴提供的真实数据,将我们的方法与其他启发式方法进行了对比,证明了它的优越性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信