{"title":"考虑时间变量和新作业到达的车间动态调度问题的深度强化学习方法","authors":"Haoyang Yu , Wenbin Gu , Na Tang , Zhenyang Guo","doi":"10.1016/j.cor.2025.107263","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, the complexity of the production process due to increased demand for customization has greatly increased the difficulty of dynamic job-shop scheduling problem (DJSP). This paper proposes a deep reinforcement learning (DRL) approach to tackle the DJSP based on proximal policy optimization (PPO) algorithm. A novel state representation method that expresses state features as multi-channel images is proposed to simplify the state characterization process. Various heuristic-based priority dispatching rules (PDRs)are used to construct action space. By converting scheduling instances into images and leveraging the spatial pyramid pooling fast (SPPF) module for feature extraction, this model can handle scheduling instances of varying scales and map size-independent processing information matrix to fixed action space. Additionally, a dense reward based on a predefined scheduling region is developed to offer detailed guidance to the agent, enabling more precise and comprehensive policy assessment. Static tests are conducted on well-known benchmarks, and the experimental results indicate that our scheduling model surpasses the performance of the three latest DRL approaches on average. Compared with PDR methods, dynamic experiments demonstrate that the proposed DRL model excels in adaptability and robustness when new tasks arrive and the processing time fluctuates with uncertainty.</div></div>","PeriodicalId":10542,"journal":{"name":"Computers & Operations Research","volume":"185 ","pages":"Article 107263"},"PeriodicalIF":4.3000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A deep reinforcement learning approach for dynamic job-shop scheduling problem considering time variable and new job arrivals\",\"authors\":\"Haoyang Yu , Wenbin Gu , Na Tang , Zhenyang Guo\",\"doi\":\"10.1016/j.cor.2025.107263\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In recent years, the complexity of the production process due to increased demand for customization has greatly increased the difficulty of dynamic job-shop scheduling problem (DJSP). This paper proposes a deep reinforcement learning (DRL) approach to tackle the DJSP based on proximal policy optimization (PPO) algorithm. A novel state representation method that expresses state features as multi-channel images is proposed to simplify the state characterization process. Various heuristic-based priority dispatching rules (PDRs)are used to construct action space. By converting scheduling instances into images and leveraging the spatial pyramid pooling fast (SPPF) module for feature extraction, this model can handle scheduling instances of varying scales and map size-independent processing information matrix to fixed action space. Additionally, a dense reward based on a predefined scheduling region is developed to offer detailed guidance to the agent, enabling more precise and comprehensive policy assessment. Static tests are conducted on well-known benchmarks, and the experimental results indicate that our scheduling model surpasses the performance of the three latest DRL approaches on average. Compared with PDR methods, dynamic experiments demonstrate that the proposed DRL model excels in adaptability and robustness when new tasks arrive and the processing time fluctuates with uncertainty.</div></div>\",\"PeriodicalId\":10542,\"journal\":{\"name\":\"Computers & Operations Research\",\"volume\":\"185 \",\"pages\":\"Article 107263\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Operations Research\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0305054825002928\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Operations Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0305054825002928","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
A deep reinforcement learning approach for dynamic job-shop scheduling problem considering time variable and new job arrivals
In recent years, the complexity of the production process due to increased demand for customization has greatly increased the difficulty of dynamic job-shop scheduling problem (DJSP). This paper proposes a deep reinforcement learning (DRL) approach to tackle the DJSP based on proximal policy optimization (PPO) algorithm. A novel state representation method that expresses state features as multi-channel images is proposed to simplify the state characterization process. Various heuristic-based priority dispatching rules (PDRs)are used to construct action space. By converting scheduling instances into images and leveraging the spatial pyramid pooling fast (SPPF) module for feature extraction, this model can handle scheduling instances of varying scales and map size-independent processing information matrix to fixed action space. Additionally, a dense reward based on a predefined scheduling region is developed to offer detailed guidance to the agent, enabling more precise and comprehensive policy assessment. Static tests are conducted on well-known benchmarks, and the experimental results indicate that our scheduling model surpasses the performance of the three latest DRL approaches on average. Compared with PDR methods, dynamic experiments demonstrate that the proposed DRL model excels in adaptability and robustness when new tasks arrive and the processing time fluctuates with uncertainty.
期刊介绍:
Operations research and computers meet in a large number of scientific fields, many of which are of vital current concern to our troubled society. These include, among others, ecology, transportation, safety, reliability, urban planning, economics, inventory control, investment strategy and logistics (including reverse logistics). Computers & Operations Research provides an international forum for the application of computers and operations research techniques to problems in these and related fields.