Deep Reinforcement Learning-based Multi-Objective Scheduling for Distributed Heterogeneous Hybrid Flow Shops with Blocking Constraints

IF 10.1 1区工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY

Engineering Pub Date : 2025-03-01 DOI:10.1016/j.eng.2024.11.033

Xueyan Sun , Weiming Shen , Jiaxin Fan , Birgit Vogel-Heuser , Fandi Bi , Chunjiang Zhang

{"title":"Deep Reinforcement Learning-based Multi-Objective Scheduling for Distributed Heterogeneous Hybrid Flow Shops with Blocking Constraints","authors":"Xueyan Sun , Weiming Shen , Jiaxin Fan , Birgit Vogel-Heuser , Fandi Bi , Chunjiang Zhang","doi":"10.1016/j.eng.2024.11.033","DOIUrl":null,"url":null,"abstract":"<div><div>This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem (DHHBFSP) designed to minimize the total tardiness and total energy consumption simultaneously, and proposes an improved proximal policy optimization (IPPO) method to make real-time decisions for the DHHBFSP. A multi-objective Markov decision process is modeled for the DHHBFSP, where the reward function is represented by a vector with dynamic weights instead of the common objective-related scalar value. A factory agent (FA) is formulated for each factory to select unscheduled jobs and is trained by the proposed IPPO to improve the decision quality. Multiple FAs work asynchronously to allocate jobs that arrive randomly at the shop. A two-stage training strategy is introduced in the IPPO, which learns from both single- and dual-policy data for better data utilization. The proposed IPPO is tested on randomly generated instances and compared with variants of the basic proximal policy optimization (PPO), dispatch rules, multi-objective metaheuristics, and multi-agent reinforcement learning methods. Extensive experimental results suggest that the proposed strategies offer significant improvements to the basic PPO, and the proposed IPPO outperforms the state-of-the-art scheduling methods in both convergence and solution quality.</div></div>","PeriodicalId":11783,"journal":{"name":"Engineering","volume":"46 ","pages":"Pages 278-291"},"PeriodicalIF":10.1000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2095809924007264","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem (DHHBFSP) designed to minimize the total tardiness and total energy consumption simultaneously, and proposes an improved proximal policy optimization (IPPO) method to make real-time decisions for the DHHBFSP. A multi-objective Markov decision process is modeled for the DHHBFSP, where the reward function is represented by a vector with dynamic weights instead of the common objective-related scalar value. A factory agent (FA) is formulated for each factory to select unscheduled jobs and is trained by the proposed IPPO to improve the decision quality. Multiple FAs work asynchronously to allocate jobs that arrive randomly at the shop. A two-stage training strategy is introduced in the IPPO, which learns from both single- and dual-policy data for better data utilization. The proposed IPPO is tested on randomly generated instances and compared with variants of the basic proximal policy optimization (PPO), dispatch rules, multi-objective metaheuristics, and multi-agent reinforcement learning methods. Extensive experimental results suggest that the proposed strategies offer significant improvements to the basic PPO, and the proposed IPPO outperforms the state-of-the-art scheduling methods in both convergence and solution quality.

查看原文本刊更多论文

基于深度强化学习的分布式异构混合流车间阻塞约束多目标调度

研究了以总延迟和总能耗同时最小化为目标的分布式异构混合阻塞流车间调度问题，提出了一种改进的近端策略优化（IPPO）方法对该问题进行实时决策。针对dhhbbfsp模型建立了一个多目标马尔可夫决策过程，其中奖励函数由一个具有动态权重的向量来表示，而不是普通的与目标相关的标量值。为每个工厂制定一个工厂代理（FA）来选择计划外工作，并由拟议的IPPO进行培训，以提高决策质量。多个fa异步工作以分配随机到达车间的作业。在IPPO中引入了一种两阶段的训练策略，它从单策略和双策略数据中学习，以更好地利用数据。该方法在随机生成的实例上进行了测试，并与基本近端策略优化（PPO）、调度规则、多目标元启发式和多智能体强化学习方法的变体进行了比较。大量的实验结果表明，所提出的策略对基本调度策略有显著的改进，并且所提出的调度策略在收敛性和解决方案质量方面都优于当前最先进的调度方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Engineering Environmental Science-Environmental Engineering

自引率

1.60%

发文量

335

审稿时长

35 days

期刊介绍： Engineering, an international open-access journal initiated by the Chinese Academy of Engineering (CAE) in 2015, serves as a distinguished platform for disseminating cutting-edge advancements in engineering R&D, sharing major research outputs, and highlighting key achievements worldwide. The journal's objectives encompass reporting progress in engineering science, fostering discussions on hot topics, addressing areas of interest, challenges, and prospects in engineering development, while considering human and environmental well-being and ethics in engineering. It aims to inspire breakthroughs and innovations with profound economic and social significance, propelling them to advanced international standards and transforming them into a new productive force. Ultimately, this endeavor seeks to bring about positive changes globally, benefit humanity, and shape a new future.