A collaborative-learning multi-agent reinforcement learning method for distributed hybrid flow shop scheduling problem

IF 8.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Swarm and Evolutionary Computation Pub Date : 2024-11-09 DOI:10.1016/j.swevo.2024.101764

Yuanzhu Di , Libao Deng , Lili Zhang

{"title":"A collaborative-learning multi-agent reinforcement learning method for distributed hybrid flow shop scheduling problem","authors":"Yuanzhu Di , Libao Deng , Lili Zhang","doi":"10.1016/j.swevo.2024.101764","DOIUrl":null,"url":null,"abstract":"<div><div>As the increasing level of implementation of artificial intelligence technology in solving complex engineering optimization problems, various learning mechanisms, including deep learning (DL) and reinforcement learning (RL), have been developed for manufacturing scheduling. In this paper, a collaborative-learning multi-agent RL method (CL-MARL) is proposed for solving distributed hybrid flow-shop scheduling problem (DHFSP), minimizing both makespan and total energy consumption. First, the DHFSP is formulated as the Markov decision process, the features of machines and jobs are represented as state and observation matrixes according to their characteristics, the candidate operation set is used as action space, and a reward mechanism is designed based on the machine utilization. Next, a set of critic networks and actor networks, consist of recurrent neural networks and fully connected networks, are employed to map the states and observations into the output values. Then, a novel distance matching strategy is designed for each agent to select the most appropriate action at each scheduling step. Finally, the proposed CL-MARL model is trained through multi-agent deep deterministic policy gradient algorithm in collaborative-learning manner. The numerical results prove the effectiveness of the proposed multi-agent system, and the comparisons with existing algorithms demonstrate the high-potential of CL-MARL in solving DHFSP.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"91 ","pages":"Article 101764"},"PeriodicalIF":8.2000,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S221065022400302X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

As the increasing level of implementation of artificial intelligence technology in solving complex engineering optimization problems, various learning mechanisms, including deep learning (DL) and reinforcement learning (RL), have been developed for manufacturing scheduling. In this paper, a collaborative-learning multi-agent RL method (CL-MARL) is proposed for solving distributed hybrid flow-shop scheduling problem (DHFSP), minimizing both makespan and total energy consumption. First, the DHFSP is formulated as the Markov decision process, the features of machines and jobs are represented as state and observation matrixes according to their characteristics, the candidate operation set is used as action space, and a reward mechanism is designed based on the machine utilization. Next, a set of critic networks and actor networks, consist of recurrent neural networks and fully connected networks, are employed to map the states and observations into the output values. Then, a novel distance matching strategy is designed for each agent to select the most appropriate action at each scheduling step. Finally, the proposed CL-MARL model is trained through multi-agent deep deterministic policy gradient algorithm in collaborative-learning manner. The numerical results prove the effectiveness of the proposed multi-agent system, and the comparisons with existing algorithms demonstrate the high-potential of CL-MARL in solving DHFSP.

查看原文本刊更多论文

分布式混合流水车间调度问题的协作学习多代理强化学习方法

随着人工智能技术在解决复杂工程优化问题中的应用水平不断提高，包括深度学习（DL）和强化学习（RL）在内的各种学习机制已被开发用于生产调度。本文提出了一种协作学习多代理 RL 方法（CL-MARL），用于求解分布式混合流车间调度问题（DHFSP），使生产周期和总能耗最小。首先，将 DHFSP 拟定为马尔可夫决策过程，根据机器和作业的特征将其表示为状态矩阵和观测矩阵，将候选操作集作为行动空间，并根据机器利用率设计奖励机制。然后，采用一组由递归神经网络和全连接网络组成的批评者网络和行动者网络，将状态和观测值映射为输出值。然后，为每个代理设计了一种新颖的距离匹配策略，以便在每个调度步骤中选择最合适的行动。最后，通过协作学习方式的多代理深度确定性策略梯度算法来训练所提出的 CL-MARL 模型。数值结果证明了所提出的多代理系统的有效性，与现有算法的比较也证明了 CL-MARL 在解决 DHFSP 方面的巨大潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Swarm and Evolutionary Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

16.00

自引率

12.00%

发文量

169

期刊介绍： Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.