Data-driven hierarchical multi-policy deep reinforcement learning framework for multi-objective multiplicity dynamic flexible job shop scheduling

IF 12.2 1区工程技术 Q1 ENGINEERING, INDUSTRIAL

Journal of Manufacturing Systems Pub Date : 2025-04-04 DOI:10.1016/j.jmsy.2025.03.019

Linshan Ding , Zailin Guan , Dan Luo , Lei Yue

{"title":"Data-driven hierarchical multi-policy deep reinforcement learning framework for multi-objective multiplicity dynamic flexible job shop scheduling","authors":"Linshan Ding , Zailin Guan , Dan Luo , Lei Yue","doi":"10.1016/j.jmsy.2025.03.019","DOIUrl":null,"url":null,"abstract":"<div><div>In the context of Industry 4.0, manufacturers face pressure to personalize products and accelerate the supply chain. This requires rapid response to volatile production schedules, ensuring a balance between operational efficiency and product quality. Moreover, the rapid development and convergence of the cloud computing, Internet of Things (IoT), and big data have expanded the need for real-time tracking and adaptive scheduling to address uncertainties, such as equipment downtime, supply variation, and ongoing product revisions. The capability of IoT has significantly improved the continuous monitoring and data analysis, emphasizing the importance of developing effective real-time scheduling solutions in the manufacturing system. In response to these evolving industrial requirements, and driven by objectives to reduce the makespan, total tardiness, and energy consumption, we study the multi-objective multiplicity dynamic flexible job shop scheduling problem (MOMDFJSP), to cope with the challenges of new order arrivals and machine breakdowns in the IoT-enabled manufacturing system. This study proposes a novel hierarchical multi-policy deep reinforcement learning framework for IoT-infused manufacturing environments, aiming to integrate these diverse requirements and uncertainties into a coherent and responsive scheduling framework. The proposed framework comprises an upper-level control policy network and three lower-level objective policy networks. The upper-level and lower-level networks are respectively responsible for selecting temporary optimization objectives and specific dispatching rules. Based on the proposed framework, we design a two-stage training approach named the hierarchical multi-policy soft actor-critic (HMPSAC) algorithm to train multiple policy networks. In addition, we develop a fluid model to design the state features and dispatching rules that act as inputs and outputs, respectively, for the deep reinforcement learning (DRL) policy network. The comparative analysis with well-known dispatching rules and DRL-based methods reveals the superior performance of HMPSAC algorithm.</div></div>","PeriodicalId":16227,"journal":{"name":"Journal of Manufacturing Systems","volume":"80 ","pages":"Pages 536-562"},"PeriodicalIF":12.2000,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Manufacturing Systems","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0278612525000809","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}

引用次数: 0

Abstract

In the context of Industry 4.0, manufacturers face pressure to personalize products and accelerate the supply chain. This requires rapid response to volatile production schedules, ensuring a balance between operational efficiency and product quality. Moreover, the rapid development and convergence of the cloud computing, Internet of Things (IoT), and big data have expanded the need for real-time tracking and adaptive scheduling to address uncertainties, such as equipment downtime, supply variation, and ongoing product revisions. The capability of IoT has significantly improved the continuous monitoring and data analysis, emphasizing the importance of developing effective real-time scheduling solutions in the manufacturing system. In response to these evolving industrial requirements, and driven by objectives to reduce the makespan, total tardiness, and energy consumption, we study the multi-objective multiplicity dynamic flexible job shop scheduling problem (MOMDFJSP), to cope with the challenges of new order arrivals and machine breakdowns in the IoT-enabled manufacturing system. This study proposes a novel hierarchical multi-policy deep reinforcement learning framework for IoT-infused manufacturing environments, aiming to integrate these diverse requirements and uncertainties into a coherent and responsive scheduling framework. The proposed framework comprises an upper-level control policy network and three lower-level objective policy networks. The upper-level and lower-level networks are respectively responsible for selecting temporary optimization objectives and specific dispatching rules. Based on the proposed framework, we design a two-stage training approach named the hierarchical multi-policy soft actor-critic (HMPSAC) algorithm to train multiple policy networks. In addition, we develop a fluid model to design the state features and dispatching rules that act as inputs and outputs, respectively, for the deep reinforcement learning (DRL) policy network. The comparative analysis with well-known dispatching rules and DRL-based methods reveals the superior performance of HMPSAC algorithm.

查看原文本刊更多论文

基于数据驱动的分层多策略深度强化学习框架的多目标多重动态柔性作业车间调度

在工业4.0的背景下，制造商面临着个性化产品和加速供应链的压力。这需要快速响应多变的生产计划，确保运营效率和产品质量之间的平衡。此外，云计算、物联网（IoT）和大数据的快速发展和融合扩大了对实时跟踪和自适应调度的需求，以解决设备停机、供应变化和持续产品修订等不确定性。物联网的能力大大提高了持续监控和数据分析的能力，强调了在制造系统中开发有效的实时调度解决方案的重要性。为了响应这些不断变化的工业需求，并以减少完工时间、总延误和能源消耗为目标，我们研究了多目标多重动态灵活作业车间调度问题（MOMDFJSP），以应对物联网制造系统中新订单到达和机器故障的挑战。本研究为物联网制造环境提出了一种新的分层多策略深度强化学习框架，旨在将这些不同的需求和不确定性整合到一个连贯且响应迅速的调度框架中。该框架包括一个上层控制政策网络和三个下层目标政策网络。上级网络和下级网络分别负责选择临时优化目标和具体调度规则。基于所提出的框架，我们设计了一种两阶段训练方法，称为分层多策略软行为者批评家（HMPSAC）算法，用于训练多策略网络。此外，我们开发了一个流体模型来设计状态特征和调度规则，分别作为深度强化学习（DRL）策略网络的输入和输出。通过与知名调度规则和基于drl的调度方法的对比分析，揭示了HMPSAC算法的优越性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Manufacturing Systems 工程技术-工程：工业

CiteScore

23.30

自引率

13.20%

发文量

216

审稿时长

25 days

期刊介绍： The Journal of Manufacturing Systems is dedicated to showcasing cutting-edge fundamental and applied research in manufacturing at the systems level. Encompassing products, equipment, people, information, control, and support functions, manufacturing systems play a pivotal role in the economical and competitive development, production, delivery, and total lifecycle of products, meeting market and societal needs. With a commitment to publishing archival scholarly literature, the journal strives to advance the state of the art in manufacturing systems and foster innovation in crafting efficient, robust, and sustainable manufacturing systems. The focus extends from equipment-level considerations to the broader scope of the extended enterprise. The Journal welcomes research addressing challenges across various scales, including nano, micro, and macro-scale manufacturing, and spanning diverse sectors such as aerospace, automotive, energy, and medical device manufacturing.