考虑可变加工时间的动态柔性车间调度问题的深度强化学习

IF 14.2 1区工程技术 Q1 ENGINEERING, INDUSTRIAL

Journal of Manufacturing Systems Pub Date : 2023-09-27 DOI:10.1016/j.jmsy.2023.09.009

Lu Zhang , Yi Feng , Qinge Xiao , Yunlang Xu , Di Li , Dongsheng Yang , Zhile Yang

{"title":"考虑可变加工时间的动态柔性车间调度问题的深度强化学习","authors":"Lu Zhang , Yi Feng , Qinge Xiao , Yunlang Xu , Di Li , Dongsheng Yang , Zhile Yang","doi":"10.1016/j.jmsy.2023.09.009","DOIUrl":null,"url":null,"abstract":"<div><p><span><span>In recent years, the uncertainties and complexity in the production process, due to the boosted customized requirements, has dramatically increased the difficulties of Dynamic Flexible Job Shop Scheduling (DFJSP). This paper investigates a new DFJSP model taking into account the minimum completion time under the condition of machine processing time uncertainty, e.t. VPT-FJSP problem. In the formulated VPT-FJSP process, each workpiece needs to be processed by required machine at a certain time slot where Markov decision process (MDP) and reinforcement learning methods are adopted to solve VPT-FJSP. The agent designed in this paper employs the Proximal Policy Optimization(PPO) algorithm in deep reinforcement learning, which includes the Actor-Critic network. The input of the network is to extract the processing information matrix and to embed some advanced states in the workshop by graph neural network, which enables the agent to learn the complete state of the environment. Finally, we train and test the proposed framework on the canonical FJSP benchmark, and the experimental results show that our framework can make agent better than </span>genetic algorithm and </span>ant colony optimization in most cases, 94.29% of static scheduling. It is also shown superiority compared to the scheduling rules in dynamic environment and has demonstrated strong robustness in solving VPT-FJSP. Furthermore, this study conducted tests to assess the generalization capability of the agent on VPT-FJSP at different scales. In terms of exploring Makespan minimization, the agent outperformed four priority scheduling rules. These results indicate that the proposed dynamic scheduling framework and PPO algorithm are more effective in achieving superior solutions.</p></div>","PeriodicalId":16227,"journal":{"name":"Journal of Manufacturing Systems","volume":"71 ","pages":"Pages 257-273"},"PeriodicalIF":14.2000,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep reinforcement learning for dynamic flexible job shop scheduling problem considering variable processing times\",\"authors\":\"Lu Zhang , Yi Feng , Qinge Xiao , Yunlang Xu , Di Li , Dongsheng Yang , Zhile Yang\",\"doi\":\"10.1016/j.jmsy.2023.09.009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p><span><span>In recent years, the uncertainties and complexity in the production process, due to the boosted customized requirements, has dramatically increased the difficulties of Dynamic Flexible Job Shop Scheduling (DFJSP). This paper investigates a new DFJSP model taking into account the minimum completion time under the condition of machine processing time uncertainty, e.t. VPT-FJSP problem. In the formulated VPT-FJSP process, each workpiece needs to be processed by required machine at a certain time slot where Markov decision process (MDP) and reinforcement learning methods are adopted to solve VPT-FJSP. The agent designed in this paper employs the Proximal Policy Optimization(PPO) algorithm in deep reinforcement learning, which includes the Actor-Critic network. The input of the network is to extract the processing information matrix and to embed some advanced states in the workshop by graph neural network, which enables the agent to learn the complete state of the environment. Finally, we train and test the proposed framework on the canonical FJSP benchmark, and the experimental results show that our framework can make agent better than </span>genetic algorithm and </span>ant colony optimization in most cases, 94.29% of static scheduling. It is also shown superiority compared to the scheduling rules in dynamic environment and has demonstrated strong robustness in solving VPT-FJSP. Furthermore, this study conducted tests to assess the generalization capability of the agent on VPT-FJSP at different scales. In terms of exploring Makespan minimization, the agent outperformed four priority scheduling rules. These results indicate that the proposed dynamic scheduling framework and PPO algorithm are more effective in achieving superior solutions.</p></div>\",\"PeriodicalId\":16227,\"journal\":{\"name\":\"Journal of Manufacturing Systems\",\"volume\":\"71 \",\"pages\":\"Pages 257-273\"},\"PeriodicalIF\":14.2000,\"publicationDate\":\"2023-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Manufacturing Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0278612523001917\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, INDUSTRIAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Manufacturing Systems","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0278612523001917","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}

引用次数: 0

摘要

近年来，由于定制需求的增加，生产过程中的不确定性和复杂性大大增加了动态柔性车间调度（DFJSP）的难度。本文研究了在机器加工时间不确定的情况下，考虑最小完成时间的DFJSP新模型，即VPT-FJSP问题。在公式化的VPT-FJSP过程中，每个工件都需要在特定的时间段由所需的机器进行处理，其中采用马尔可夫决策过程（MDP）和强化学习方法来求解VPT-FJSP。本文设计的agent在深度强化学习中采用了近端策略优化（PPO）算法，其中包括Actor-Critic网络。网络的输入是提取加工信息矩阵，并通过图神经网络将一些高级状态嵌入到车间中，使智能体能够学习环境的完整状态。最后，我们在标准FJSP基准上对所提出的框架进行了训练和测试，实验结果表明，在大多数情况下，我们的框架可以使agent优于遗传算法和蚁群优化，静态调度的效率为94.29%。与动态环境中的调度规则相比，它也显示出了优越性，并在求解VPT-FJSP时表现出了强大的鲁棒性。此外，本研究还进行了测试，以评估该药剂在不同尺度上对VPT-FJSP的泛化能力。在探索Makespan最小化方面，该代理优于四个优先级调度规则。这些结果表明，所提出的动态调度框架和PPO算法在获得优越的解决方案方面更有效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep reinforcement learning for dynamic flexible job shop scheduling problem considering variable processing times

In recent years, the uncertainties and complexity in the production process, due to the boosted customized requirements, has dramatically increased the difficulties of Dynamic Flexible Job Shop Scheduling (DFJSP). This paper investigates a new DFJSP model taking into account the minimum completion time under the condition of machine processing time uncertainty, e.t. VPT-FJSP problem. In the formulated VPT-FJSP process, each workpiece needs to be processed by required machine at a certain time slot where Markov decision process (MDP) and reinforcement learning methods are adopted to solve VPT-FJSP. The agent designed in this paper employs the Proximal Policy Optimization(PPO) algorithm in deep reinforcement learning, which includes the Actor-Critic network. The input of the network is to extract the processing information matrix and to embed some advanced states in the workshop by graph neural network, which enables the agent to learn the complete state of the environment. Finally, we train and test the proposed framework on the canonical FJSP benchmark, and the experimental results show that our framework can make agent better than genetic algorithm and ant colony optimization in most cases, 94.29% of static scheduling. It is also shown superiority compared to the scheduling rules in dynamic environment and has demonstrated strong robustness in solving VPT-FJSP. Furthermore, this study conducted tests to assess the generalization capability of the agent on VPT-FJSP at different scales. In terms of exploring Makespan minimization, the agent outperformed four priority scheduling rules. These results indicate that the proposed dynamic scheduling framework and PPO algorithm are more effective in achieving superior solutions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Manufacturing Systems 工程技术-工程：工业

CiteScore

23.30

自引率

13.20%

发文量

216

审稿时长

25 days

期刊介绍： The Journal of Manufacturing Systems is dedicated to showcasing cutting-edge fundamental and applied research in manufacturing at the systems level. Encompassing products, equipment, people, information, control, and support functions, manufacturing systems play a pivotal role in the economical and competitive development, production, delivery, and total lifecycle of products, meeting market and societal needs. With a commitment to publishing archival scholarly literature, the journal strives to advance the state of the art in manufacturing systems and foster innovation in crafting efficient, robust, and sustainable manufacturing systems. The focus extends from equipment-level considerations to the broader scope of the extended enterprise. The Journal welcomes research addressing challenges across various scales, including nano, micro, and macro-scale manufacturing, and spanning diverse sectors such as aerospace, automotive, energy, and medical device manufacturing.