A Reinforcement Learning Approach for Scheduling Problems with Improved Generalization through Order Swapping

IF 4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine learning and knowledge extraction Pub Date : 2023-04-29 DOI:10.3390/make5020025

Deepak Vivekanandan, Samuel Wirth, Patrick Karlbauer, Noah Klarmann

{"title":"A Reinforcement Learning Approach for Scheduling Problems with Improved Generalization through Order Swapping","authors":"Deepak Vivekanandan, Samuel Wirth, Patrick Karlbauer, Noah Klarmann","doi":"10.3390/make5020025","DOIUrl":null,"url":null,"abstract":"The scheduling of production resources (such as associating jobs to machines) plays a vital role for the manufacturing industry not only for saving energy, but also for increasing the overall efficiency. Among the different job scheduling problems, the Job Shop Scheduling Problem (JSSP) is addressed in this work. JSSP falls into the category of NP-hard Combinatorial Optimization Problem (COP), in which solving the problem through exhaustive search becomes unfeasible. Simple heuristics such as First-In, First-Out, Largest Processing Time First and metaheuristics such as taboo search are often adopted to solve the problem by truncating the search space. The viability of the methods becomes inefficient for large problem sizes as it is either far from the optimum or time consuming. In recent years, the research towards using Deep Reinforcement Learning (DRL) to solve COPs has gained interest and has shown promising results in terms of solution quality and computational efficiency. In this work, we provide an novel approach to solve the JSSP examining the objectives generalization and solution effectiveness using DRL. In particular, we employ the Proximal Policy Optimization (PPO) algorithm that adopts the policy-gradient paradigm that is found to perform well in the constrained dispatching of jobs. We incorporated a new method called Order Swapping Mechanism (OSM) in the environment to achieve better generalized learning of the problem. The performance of the presented approach is analyzed in depth by using a set of available benchmark instances and comparing our results with the work of other groups.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"20 1","pages":"0"},"PeriodicalIF":4.0000,"publicationDate":"2023-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning and knowledge extraction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/make5020025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 1

Abstract

The scheduling of production resources (such as associating jobs to machines) plays a vital role for the manufacturing industry not only for saving energy, but also for increasing the overall efficiency. Among the different job scheduling problems, the Job Shop Scheduling Problem (JSSP) is addressed in this work. JSSP falls into the category of NP-hard Combinatorial Optimization Problem (COP), in which solving the problem through exhaustive search becomes unfeasible. Simple heuristics such as First-In, First-Out, Largest Processing Time First and metaheuristics such as taboo search are often adopted to solve the problem by truncating the search space. The viability of the methods becomes inefficient for large problem sizes as it is either far from the optimum or time consuming. In recent years, the research towards using Deep Reinforcement Learning (DRL) to solve COPs has gained interest and has shown promising results in terms of solution quality and computational efficiency. In this work, we provide an novel approach to solve the JSSP examining the objectives generalization and solution effectiveness using DRL. In particular, we employ the Proximal Policy Optimization (PPO) algorithm that adopts the policy-gradient paradigm that is found to perform well in the constrained dispatching of jobs. We incorporated a new method called Order Swapping Mechanism (OSM) in the environment to achieve better generalized learning of the problem. The performance of the presented approach is analyzed in depth by using a set of available benchmark instances and comparing our results with the work of other groups.

查看原文本刊更多论文

基于顺序交换的改进泛化调度问题的强化学习方法

生产资源的调度(例如将作业与机器关联)不仅对节省能源，而且对提高整体效率起着至关重要的作用。在各种作业调度问题中，本文研究了作业车间调度问题(job Shop scheduling Problem, JSSP)。JSSP属于NP-hard组合优化问题(COP)，通过穷举搜索解决问题变得不可行。通常采用先入先出、最大处理时间优先等简单的启发式方法和禁忌搜索等元启发式方法截断搜索空间来解决问题。对于大型问题，这些方法的可行性变得低效，因为它要么远非最优，要么耗时。近年来，利用深度强化学习(DRL)解决cop问题的研究引起了人们的兴趣，并在解决质量和计算效率方面取得了可喜的成果。在这项工作中，我们提供了一种新的方法来解决JSSP，使用DRL检查目标泛化和解决方案的有效性。特别是，我们采用了采用策略梯度范式的近端策略优化(PPO)算法，该算法在工作的约束调度中表现良好。我们在环境中引入了一种新的方法，称为顺序交换机制(OSM)，以更好地实现问题的泛化学习。通过使用一组可用的基准实例并将我们的结果与其他小组的工作进行比较，深入分析了所提出方法的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊