A deep reinforcement learning approach with graph attention network and multi-signal differential reward for dynamic hybrid flow shop scheduling problem
{"title":"A deep reinforcement learning approach with graph attention network and multi-signal differential reward for dynamic hybrid flow shop scheduling problem","authors":"Youshan Liu, Jiaxin Fan, Weiming Shen","doi":"10.1016/j.jmsy.2025.03.028","DOIUrl":null,"url":null,"abstract":"<div><div>In real-life manufacturing systems, production management often faces uncertainty due to urgent demands and dynamic job insertions. Such uncertain environments pose significant challenges for scheduling, particularly in minimizing delivery delays and improving overall efficiency. Deep reinforcement learning (DRL) brings potential for rapid real-time production decisions, but scheduling in these environments with the objective of reducing delivery delays remains a challenging problem. This paper investigates a hybrid flow-shop dynamic scheduling problem with job insertions for minimizing the total weighted tardiness (TWT). An end-to-end DRL based method, the proximal policy optimization with graph attention network (PPO-GAT), is proposed to address the problem. First, a multi-agent system is established to simulate the actual manufacturing system and serve as a foundation for implementing intelligent production scheduling. Then, a novel graph-based state representation is developed to observe instantaneous states for the hybrid flow-shop. Two graph models are designed to represent system features and job features, and are extracted and fused by graph attention networks (GAT) to form the global feature. Afterwards, a multi-signal differential reward (MSDR) function is designed to address the intractable reward sparsity caused by the TWT objective. Finally, ablation experiments are conducted to validate all the proposed algorithmic components, and the PPO-GAT is compared with benchmark methods. Experimental results demonstrate the superiority of the proposed GAT, MSDR, and PPO-GAT. Moreover, the PPO-GAT has been proven to make real-time scheduling decisions for hybrid flow-shops with any scale, which can be considered as a promising solution for extensive industrial applications.</div></div>","PeriodicalId":16227,"journal":{"name":"Journal of Manufacturing Systems","volume":"80 ","pages":"Pages 643-661"},"PeriodicalIF":12.2000,"publicationDate":"2025-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Manufacturing Systems","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0278612525000883","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}
引用次数: 0
Abstract
In real-life manufacturing systems, production management often faces uncertainty due to urgent demands and dynamic job insertions. Such uncertain environments pose significant challenges for scheduling, particularly in minimizing delivery delays and improving overall efficiency. Deep reinforcement learning (DRL) brings potential for rapid real-time production decisions, but scheduling in these environments with the objective of reducing delivery delays remains a challenging problem. This paper investigates a hybrid flow-shop dynamic scheduling problem with job insertions for minimizing the total weighted tardiness (TWT). An end-to-end DRL based method, the proximal policy optimization with graph attention network (PPO-GAT), is proposed to address the problem. First, a multi-agent system is established to simulate the actual manufacturing system and serve as a foundation for implementing intelligent production scheduling. Then, a novel graph-based state representation is developed to observe instantaneous states for the hybrid flow-shop. Two graph models are designed to represent system features and job features, and are extracted and fused by graph attention networks (GAT) to form the global feature. Afterwards, a multi-signal differential reward (MSDR) function is designed to address the intractable reward sparsity caused by the TWT objective. Finally, ablation experiments are conducted to validate all the proposed algorithmic components, and the PPO-GAT is compared with benchmark methods. Experimental results demonstrate the superiority of the proposed GAT, MSDR, and PPO-GAT. Moreover, the PPO-GAT has been proven to make real-time scheduling decisions for hybrid flow-shops with any scale, which can be considered as a promising solution for extensive industrial applications.
期刊介绍:
The Journal of Manufacturing Systems is dedicated to showcasing cutting-edge fundamental and applied research in manufacturing at the systems level. Encompassing products, equipment, people, information, control, and support functions, manufacturing systems play a pivotal role in the economical and competitive development, production, delivery, and total lifecycle of products, meeting market and societal needs.
With a commitment to publishing archival scholarly literature, the journal strives to advance the state of the art in manufacturing systems and foster innovation in crafting efficient, robust, and sustainable manufacturing systems. The focus extends from equipment-level considerations to the broader scope of the extended enterprise. The Journal welcomes research addressing challenges across various scales, including nano, micro, and macro-scale manufacturing, and spanning diverse sectors such as aerospace, automotive, energy, and medical device manufacturing.