A dynamic scheduling method with Conv-Dueling and generalized representation based on reinforcement learning

IF 1.8 3区工程技术 Q4 ENGINEERING, INDUSTRIAL

International Journal of Industrial Engineering Computations Pub Date : 2023-01-01 DOI:10.5267/j.ijiec.2023.6.003

Minghao Xia, Haibin Liu, Mingfei Li, Long Wang

{"title":"A dynamic scheduling method with Conv-Dueling and generalized representation based on reinforcement learning","authors":"Minghao Xia, Haibin Liu, Mingfei Li, Long Wang","doi":"10.5267/j.ijiec.2023.6.003","DOIUrl":null,"url":null,"abstract":"In modern industrial manufacturing, there are uncertain dynamic disturbances between processing machines and jobs which will disrupt the original production plan. This research focuses on dynamic multi-objective flexible scheduling problems such as the multi-constraint relationship among machines, jobs, and uncertain disturbance events. The possible disturbance events include job insertion, machine breakdown, and processing time change. The paper proposes a conv-dueling network model, a multidimensional state representation of the job processing information, and multiple scheduling objectives for minimizing makespan and delay time, while maximizing the completion punctuality rate. We design a multidimensional state space that includes job and machine processing information, an efficient and complete intelligent agent scheduling action space, and a compound scheduling reward function that combines the main task and the branch task. The unsupervised training of the network model utilizes the dueling-double-deep Q-network (D3QN) algorithm. Finally, based on the multi-constraint and multi-disturbance production environment information, the multidimensional state representation matrix of the job is used as input and the optimal scheduling rules are output after the feature extraction of the conv-dueling network model and decision making. This study carries out simulation experiments on 50 test cases. The results show the proposed conv-dueling network model can quickly converge for DQN, DDQN, and D3QN algorithms, and has good stability and universality. The experimental results indicate that the scheduling algorithm proposed in this paper outperforms DQN, DDQN, and single scheduling algorithms in all three scheduling objectives. It also demonstrates high robustness and excellent comprehensive scheduling performance.","PeriodicalId":51356,"journal":{"name":"International Journal of Industrial Engineering Computations","volume":"14 1","pages":"0"},"PeriodicalIF":1.8000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Industrial Engineering Computations","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5267/j.ijiec.2023.6.003","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}

引用次数: 1

Abstract

In modern industrial manufacturing, there are uncertain dynamic disturbances between processing machines and jobs which will disrupt the original production plan. This research focuses on dynamic multi-objective flexible scheduling problems such as the multi-constraint relationship among machines, jobs, and uncertain disturbance events. The possible disturbance events include job insertion, machine breakdown, and processing time change. The paper proposes a conv-dueling network model, a multidimensional state representation of the job processing information, and multiple scheduling objectives for minimizing makespan and delay time, while maximizing the completion punctuality rate. We design a multidimensional state space that includes job and machine processing information, an efficient and complete intelligent agent scheduling action space, and a compound scheduling reward function that combines the main task and the branch task. The unsupervised training of the network model utilizes the dueling-double-deep Q-network (D3QN) algorithm. Finally, based on the multi-constraint and multi-disturbance production environment information, the multidimensional state representation matrix of the job is used as input and the optimal scheduling rules are output after the feature extraction of the conv-dueling network model and decision making. This study carries out simulation experiments on 50 test cases. The results show the proposed conv-dueling network model can quickly converge for DQN, DDQN, and D3QN algorithms, and has good stability and universality. The experimental results indicate that the scheduling algorithm proposed in this paper outperforms DQN, DDQN, and single scheduling algorithms in all three scheduling objectives. It also demonstrates high robustness and excellent comprehensive scheduling performance.

查看原文本刊更多论文

一种基于强化学习的卷积决斗和广义表示的动态调度方法

在现代工业制造中，加工机器和作业之间存在着不确定的动态扰动，这些扰动会打乱原有的生产计划。本文主要研究机器、作业之间的多约束关系和不确定干扰事件等动态多目标柔性调度问题。可能的干扰事件包括作业插入、机器故障和加工时间改变。本文提出了一种卷积网络模型，一种作业处理信息的多维状态表示，以及最小化完工时间和延迟时间，同时最大化完工正点率的多个调度目标。设计了包含作业和机器加工信息的多维状态空间，高效完备的智能代理调度动作空间，结合主任务和分支任务的复合调度奖励函数。网络模型的无监督训练采用决斗-双深度Q-network (D3QN)算法。最后，基于多约束、多干扰的生产环境信息，将作业的多维状态表示矩阵作为输入，对卷积网络模型进行特征提取并进行决策，输出最优调度规则。本研究对50个测试用例进行了仿真实验。结果表明，所提出的卷积网络模型对DQN、DDQN和D3QN算法都能快速收敛，并具有良好的稳定性和通用性。实验结果表明，本文提出的调度算法在三个调度目标上都优于DQN、DDQN和单一调度算法。具有较高的鲁棒性和较好的综合调度性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊