基于有效样本的多智能体强化学习对自组织生产执行中机器故障的实时响应

IF 9.1 1区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Robotics and Computer-integrated Manufacturing Pub Date : 2025-04-25 DOI:10.1016/j.rcim.2025.103038

Yong Gui , Dunbing Tang , Yuqian Lu , Haihua Zhu , Zequn Zhang , Changchun Liu

{"title":"基于有效样本的多智能体强化学习对自组织生产执行中机器故障的实时响应","authors":"Yong Gui , Dunbing Tang , Yuqian Lu , Haihua Zhu , Zequn Zhang , Changchun Liu","doi":"10.1016/j.rcim.2025.103038","DOIUrl":null,"url":null,"abstract":"<div><div>With the growing demand for personalized production, multi-agent technology has been introduced to facilitate rapid self-organizing production execution. The application of communication protocols and dynamic scheduling algorithms supports multi-agent negotiation and real-time scheduling decisions in response to conventional production events. To address machine failures, real-time response strategies have been developed to manage jobs affected by the disruptions. However, the performance of existing strategies varies significantly depending on the real-time production state. In this paper, we propose a real-time response strategy using multi-agent reinforcement learning (MARL) that provides an appropriate response strategy for each job affected by machine failures, considering the real-time production state. Specifically, we establish a self-organizing production execution process with machine failures to specify the real-time response problem. Subsequently, a Markov game involving multiple buffer agents is constructed, transforming the real-time response problem into a MARL task. Furthermore, a continuous variable ranging from 0 to 1 is defined as the action space for each buffer agent, allowing it to select a response strategy for each affected job. Finally, a modified multi-agent deep deterministic policy gradient (MADDPG) algorithm is introduced, leveraging effective samples to train buffer agents at each failure moment. This enables the selection of an optimal response strategy for each affected job. Experimental results indicate that the proposed real-time response strategy outperforms both existing response strategies and the original MADDPG-based strategy across 54 distinct production configurations.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"95 ","pages":"Article 103038"},"PeriodicalIF":9.1000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Real-time response to machine failures in self-organizing production execution using multi-agent reinforcement learning with effective samples\",\"authors\":\"Yong Gui , Dunbing Tang , Yuqian Lu , Haihua Zhu , Zequn Zhang , Changchun Liu\",\"doi\":\"10.1016/j.rcim.2025.103038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the growing demand for personalized production, multi-agent technology has been introduced to facilitate rapid self-organizing production execution. The application of communication protocols and dynamic scheduling algorithms supports multi-agent negotiation and real-time scheduling decisions in response to conventional production events. To address machine failures, real-time response strategies have been developed to manage jobs affected by the disruptions. However, the performance of existing strategies varies significantly depending on the real-time production state. In this paper, we propose a real-time response strategy using multi-agent reinforcement learning (MARL) that provides an appropriate response strategy for each job affected by machine failures, considering the real-time production state. Specifically, we establish a self-organizing production execution process with machine failures to specify the real-time response problem. Subsequently, a Markov game involving multiple buffer agents is constructed, transforming the real-time response problem into a MARL task. Furthermore, a continuous variable ranging from 0 to 1 is defined as the action space for each buffer agent, allowing it to select a response strategy for each affected job. Finally, a modified multi-agent deep deterministic policy gradient (MADDPG) algorithm is introduced, leveraging effective samples to train buffer agents at each failure moment. This enables the selection of an optimal response strategy for each affected job. Experimental results indicate that the proposed real-time response strategy outperforms both existing response strategies and the original MADDPG-based strategy across 54 distinct production configurations.</div></div>\",\"PeriodicalId\":21452,\"journal\":{\"name\":\"Robotics and Computer-integrated Manufacturing\",\"volume\":\"95 \",\"pages\":\"Article 103038\"},\"PeriodicalIF\":9.1000,\"publicationDate\":\"2025-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Computer-integrated Manufacturing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0736584525000924\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Computer-integrated Manufacturing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0736584525000924","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

随着个性化生产需求的不断增长，多代理技术被引入到促进快速自组织生产执行中。通信协议和动态调度算法的应用支持多代理协商和实时调度决策，以应对常规生产事件。为应对机器故障，已开发出实时响应策略来管理受故障影响的作业。然而，现有策略的性能因实时生产状态的不同而有很大差异。在本文中，我们提出了一种使用多代理强化学习（MARL）的实时响应策略，它能在考虑到实时生产状态的情况下，为每个受机器故障影响的作业提供适当的响应策略。具体来说，我们建立了一个包含机器故障的自组织生产执行过程，以指定实时响应问题。随后，我们构建了一个涉及多个缓冲代理的马尔可夫博弈，将实时响应问题转化为 MARL 任务。此外，一个范围从 0 到 1 的连续变量被定义为每个缓冲代理的行动空间，允许它为每个受影响的作业选择响应策略。最后，引入了一种改进的多代理深度确定性策略梯度（MADDPG）算法，利用有效样本在每个故障时刻训练缓冲代理。这样就能为每个受影响的工作选择最佳响应策略。实验结果表明，在 54 种不同的生产配置中，所提出的实时响应策略优于现有的响应策略和基于 MADDPG 的原始策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Real-time response to machine failures in self-organizing production execution using multi-agent reinforcement learning with effective samples

With the growing demand for personalized production, multi-agent technology has been introduced to facilitate rapid self-organizing production execution. The application of communication protocols and dynamic scheduling algorithms supports multi-agent negotiation and real-time scheduling decisions in response to conventional production events. To address machine failures, real-time response strategies have been developed to manage jobs affected by the disruptions. However, the performance of existing strategies varies significantly depending on the real-time production state. In this paper, we propose a real-time response strategy using multi-agent reinforcement learning (MARL) that provides an appropriate response strategy for each job affected by machine failures, considering the real-time production state. Specifically, we establish a self-organizing production execution process with machine failures to specify the real-time response problem. Subsequently, a Markov game involving multiple buffer agents is constructed, transforming the real-time response problem into a MARL task. Furthermore, a continuous variable ranging from 0 to 1 is defined as the action space for each buffer agent, allowing it to select a response strategy for each affected job. Finally, a modified multi-agent deep deterministic policy gradient (MADDPG) algorithm is introduced, leveraging effective samples to train buffer agents at each failure moment. This enables the selection of an optimal response strategy for each affected job. Experimental results indicate that the proposed real-time response strategy outperforms both existing response strategies and the original MADDPG-based strategy across 54 distinct production configurations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Robotics and Computer-integrated Manufacturing 工程技术-工程：制造

CiteScore

24.10

自引率

13.50%

发文量

160

审稿时长

50 days

期刊介绍： The journal, Robotics and Computer-Integrated Manufacturing, focuses on sharing research applications that contribute to the development of new or enhanced robotics, manufacturing technologies, and innovative manufacturing strategies that are relevant to industry. Papers that combine theory and experimental validation are preferred, while review papers on current robotics and manufacturing issues are also considered. However, papers on traditional machining processes, modeling and simulation, supply chain management, and resource optimization are generally not within the scope of the journal, as there are more appropriate journals for these topics. Similarly, papers that are overly theoretical or mathematical will be directed to other suitable journals. The journal welcomes original papers in areas such as industrial robotics, human-robot collaboration in manufacturing, cloud-based manufacturing, cyber-physical production systems, big data analytics in manufacturing, smart mechatronics, machine learning, adaptive and sustainable manufacturing, and other fields involving unique manufacturing technologies.