探索基于 LSTM-PPO 的强化学习算法，以解决动态作业车间调度问题

IF 6.5 1区工程技术 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Industrial Engineering Pub Date : 2024-10-14 DOI:10.1016/j.cie.2024.110633

Wei Chen, Zequn Zhang, Dunbing Tang, Changchun Liu, Yong Gui, Qingwei Nie, Zhen Zhao

{"title":"探索基于 LSTM-PPO 的强化学习算法，以解决动态作业车间调度问题","authors":"Wei Chen, Zequn Zhang, Dunbing Tang, Changchun Liu, Yong Gui, Qingwei Nie, Zhen Zhao","doi":"10.1016/j.cie.2024.110633","DOIUrl":null,"url":null,"abstract":"<div><div>With the growth of personalized demand and the continuous improvement in social productivity, the large-scale and few-variety centralized production model is gradually transitioning towards a personalized model of small batches and multiple varieties, which makes the manufacturing process of the job shop increasingly complex. Furthermore, disruptive events such as machinery failures and rush orders in the job shop increase the uncertainty and variability of the production environment. Traditional scheduling methods are usually based on fixed rules and heuristic algorithms, which are difficult to adapt to constantly changing production environments and demands. This may lead to inaccurate scheduling decisions and hinder the optimal allocation of job shop resources. To solve the dynamic job shop scheduling problem (JSP) more effectively, this paper proposes a Reinforcement Learning (RL) optimization algorithm integrating long short-term memory (LSTM) neural network and proximal policy optimization (PPO). It can dynamically adjust scheduling strategies according to the changing production environment, achieving comprehensive status awareness of the job shop environment to make optimal scheduling decisions. First, a state-aware network framework based on LSTM-PPO is proposed to achieve real-time perception of job shop state changes. Then, the state and action space of the job shop are described within the context of the state-aware network framework. Finally, an experimental environment is established to verify the algorithm’s effectiveness. Training the LSTM-PPO algorithm makes it feasible to achieve better performance than other scheduling methods. By comparing the initial planning time with the actual completion time of the rescheduling decision under different dynamic disturbances, the efficiency of the proposed algorithm is verified for the dynamic JSP.</div></div>","PeriodicalId":55220,"journal":{"name":"Computers & Industrial Engineering","volume":"197 ","pages":"Article 110633"},"PeriodicalIF":6.5000,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Probing an LSTM-PPO-Based reinforcement learning algorithm to solve dynamic job shop scheduling problem\",\"authors\":\"Wei Chen, Zequn Zhang, Dunbing Tang, Changchun Liu, Yong Gui, Qingwei Nie, Zhen Zhao\",\"doi\":\"10.1016/j.cie.2024.110633\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the growth of personalized demand and the continuous improvement in social productivity, the large-scale and few-variety centralized production model is gradually transitioning towards a personalized model of small batches and multiple varieties, which makes the manufacturing process of the job shop increasingly complex. Furthermore, disruptive events such as machinery failures and rush orders in the job shop increase the uncertainty and variability of the production environment. Traditional scheduling methods are usually based on fixed rules and heuristic algorithms, which are difficult to adapt to constantly changing production environments and demands. This may lead to inaccurate scheduling decisions and hinder the optimal allocation of job shop resources. To solve the dynamic job shop scheduling problem (JSP) more effectively, this paper proposes a Reinforcement Learning (RL) optimization algorithm integrating long short-term memory (LSTM) neural network and proximal policy optimization (PPO). It can dynamically adjust scheduling strategies according to the changing production environment, achieving comprehensive status awareness of the job shop environment to make optimal scheduling decisions. First, a state-aware network framework based on LSTM-PPO is proposed to achieve real-time perception of job shop state changes. Then, the state and action space of the job shop are described within the context of the state-aware network framework. Finally, an experimental environment is established to verify the algorithm’s effectiveness. Training the LSTM-PPO algorithm makes it feasible to achieve better performance than other scheduling methods. By comparing the initial planning time with the actual completion time of the rescheduling decision under different dynamic disturbances, the efficiency of the proposed algorithm is verified for the dynamic JSP.</div></div>\",\"PeriodicalId\":55220,\"journal\":{\"name\":\"Computers & Industrial Engineering\",\"volume\":\"197 \",\"pages\":\"Article 110633\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2024-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Industrial Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0360835224007551\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Industrial Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0360835224007551","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

随着个性化需求的增长和社会生产力水平的不断提高，大规模、少品种的集中生产模式正逐步向小批量、多品种的个性化模式过渡，这使得作业车间的生产过程变得越来越复杂。此外，作业车间的机械故障和紧急订单等破坏性事件也增加了生产环境的不确定性和可变性。传统的排程方法通常基于固定规则和启发式算法，很难适应不断变化的生产环境和需求。这可能会导致不准确的调度决策，阻碍作业车间资源的优化分配。为了更有效地解决动态作业车间调度问题（JSP），本文提出了一种整合了长短期记忆（LSTM）神经网络和近端策略优化（PPO）的强化学习（RL）优化算法。它能根据生产环境的变化动态调整调度策略，实现对作业车间环境的全面状态感知，从而做出最优调度决策。首先，提出了基于 LSTM-PPO 的状态感知网络框架，以实现对作业车间状态变化的实时感知。然后，在状态感知网络框架中描述了作业车间的状态和行动空间。最后，建立实验环境来验证算法的有效性。通过训练 LSTM-PPO 算法，可以获得比其他调度方法更好的性能。通过比较不同动态干扰下的初始计划时间和重新安排决策的实际完成时间，验证了所提算法对于动态 JSP 的效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Probing an LSTM-PPO-Based reinforcement learning algorithm to solve dynamic job shop scheduling problem

With the growth of personalized demand and the continuous improvement in social productivity, the large-scale and few-variety centralized production model is gradually transitioning towards a personalized model of small batches and multiple varieties, which makes the manufacturing process of the job shop increasingly complex. Furthermore, disruptive events such as machinery failures and rush orders in the job shop increase the uncertainty and variability of the production environment. Traditional scheduling methods are usually based on fixed rules and heuristic algorithms, which are difficult to adapt to constantly changing production environments and demands. This may lead to inaccurate scheduling decisions and hinder the optimal allocation of job shop resources. To solve the dynamic job shop scheduling problem (JSP) more effectively, this paper proposes a Reinforcement Learning (RL) optimization algorithm integrating long short-term memory (LSTM) neural network and proximal policy optimization (PPO). It can dynamically adjust scheduling strategies according to the changing production environment, achieving comprehensive status awareness of the job shop environment to make optimal scheduling decisions. First, a state-aware network framework based on LSTM-PPO is proposed to achieve real-time perception of job shop state changes. Then, the state and action space of the job shop are described within the context of the state-aware network framework. Finally, an experimental environment is established to verify the algorithm’s effectiveness. Training the LSTM-PPO algorithm makes it feasible to achieve better performance than other scheduling methods. By comparing the initial planning time with the actual completion time of the rescheduling decision under different dynamic disturbances, the efficiency of the proposed algorithm is verified for the dynamic JSP.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Industrial Engineering 工程技术-工程：工业

CiteScore

12.70

自引率

12.70%

发文量

794

审稿时长

10.6 months

期刊介绍： Computers & Industrial Engineering (CAIE) is dedicated to researchers, educators, and practitioners in industrial engineering and related fields. Pioneering the integration of computers in research, education, and practice, industrial engineering has evolved to make computers and electronic communication integral to its domain. CAIE publishes original contributions focusing on the development of novel computerized methodologies to address industrial engineering problems. It also highlights the applications of these methodologies to issues within the broader industrial engineering and associated communities. The journal actively encourages submissions that push the boundaries of fundamental theories and concepts in industrial engineering techniques.