基于协同多智能体深度强化学习的混合流车间机器预防性维修集成动态调度方法

IF 11.4 1区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Robotics and Computer-integrated Manufacturing Pub Date : 2025-06-25 DOI:10.1016/j.rcim.2025.103085

Siqi Liu , Haiping Zhu , LieZheng Sheng

{"title":"基于协同多智能体深度强化学习的混合流车间机器预防性维修集成动态调度方法","authors":"Siqi Liu , Haiping Zhu , LieZheng Sheng","doi":"10.1016/j.rcim.2025.103085","DOIUrl":null,"url":null,"abstract":"<div><div>Hybrid flow shop widely used in manufacturing industry is facing the challenge of complex and dynamic production environment. Current study mostly cannot consider the machine preventive maintenance and dynamic events in hybrid flow shop scheduling process. Therefore, this paper presents an integrated dynamic scheduling method for hybrid flow shop scheduling problem- unrelated parallel machine considering preventive maintenance (DHSFP-UPM-PM). And the multi-scheduling objectives include minimizing completion time, processing cost and maintenance cost. Firstly, the definition of research problem and the basic maintenance strategy are presented in detail. And an integrated mathematic model of DHSFP-UPM-PM is constructed. Then the integrated dynamic scheduling framework based on cooperative multi-agent deep reinforcement learning is proposed to solve the DHSFP-UPM-PM. Based on the above, we proposed a cooperative multi-processing stage agents (PSA) approach to realize the transformation from traditional single-agent to multi-agent. Meanwhile, the cooperative multi-agent Markova Decision Process is formulated to clarify the interaction between each agent and production environment. The state and action space as the key elements of scheduling model is also designed for each PSA. To optimize scheduling objectives, this paper further formulates new global reward mechanism and centralized training-decentralized execution method based on multi agent proximal policy optimization. Lastly, the experiment results verify the superiority and effectiveness of the proposed method when solving integrated scheduling problem and dynamic event. And the proposed method presents remarkable adaptability and flexibility under a different production scenario which prove the benefits of multi-agent deep reinforcement learning in complex and dynamic environment.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"97 ","pages":"Article 103085"},"PeriodicalIF":11.4000,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrated dynamic scheduling method for hybrid flow shop with machine preventive maintenance based on cooperative multi-agent deep reinforcement learning\",\"authors\":\"Siqi Liu , Haiping Zhu , LieZheng Sheng\",\"doi\":\"10.1016/j.rcim.2025.103085\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Hybrid flow shop widely used in manufacturing industry is facing the challenge of complex and dynamic production environment. Current study mostly cannot consider the machine preventive maintenance and dynamic events in hybrid flow shop scheduling process. Therefore, this paper presents an integrated dynamic scheduling method for hybrid flow shop scheduling problem- unrelated parallel machine considering preventive maintenance (DHSFP-UPM-PM). And the multi-scheduling objectives include minimizing completion time, processing cost and maintenance cost. Firstly, the definition of research problem and the basic maintenance strategy are presented in detail. And an integrated mathematic model of DHSFP-UPM-PM is constructed. Then the integrated dynamic scheduling framework based on cooperative multi-agent deep reinforcement learning is proposed to solve the DHSFP-UPM-PM. Based on the above, we proposed a cooperative multi-processing stage agents (PSA) approach to realize the transformation from traditional single-agent to multi-agent. Meanwhile, the cooperative multi-agent Markova Decision Process is formulated to clarify the interaction between each agent and production environment. The state and action space as the key elements of scheduling model is also designed for each PSA. To optimize scheduling objectives, this paper further formulates new global reward mechanism and centralized training-decentralized execution method based on multi agent proximal policy optimization. Lastly, the experiment results verify the superiority and effectiveness of the proposed method when solving integrated scheduling problem and dynamic event. And the proposed method presents remarkable adaptability and flexibility under a different production scenario which prove the benefits of multi-agent deep reinforcement learning in complex and dynamic environment.</div></div>\",\"PeriodicalId\":21452,\"journal\":{\"name\":\"Robotics and Computer-integrated Manufacturing\",\"volume\":\"97 \",\"pages\":\"Article 103085\"},\"PeriodicalIF\":11.4000,\"publicationDate\":\"2025-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Computer-integrated Manufacturing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0736584525001395\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Computer-integrated Manufacturing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0736584525001395","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

在制造业中广泛应用的混合流程车间面临着复杂动态生产环境的挑战。目前的研究大多不能考虑混合流水车间调度过程中的机器预防性维修和动态事件。为此，本文提出了一种考虑预防性维护的不相关并联机混合流水车间调度问题的综合动态调度方法（DHSFP-UPM-PM）。多调度目标包括最小化完工时间、加工成本和维护成本。首先，详细介绍了研究问题的定义和基本维护策略。建立了DHSFP-UPM-PM的综合数学模型。在此基础上，提出了基于协作式多智能体深度强化学习的集成动态调度框架来解决DHSFP-UPM-PM问题。在此基础上，提出了一种协作式多处理阶段智能体（PSA）方法，实现了传统的单智能体向多智能体的转变。同时，建立了协作式多智能体马尔可娃决策过程，明确了各个智能体与生产环境之间的相互作用。同时，针对每个PSA设计了状态空间和动作空间作为调度模型的关键要素。为了优化调度目标，本文进一步提出了新的全局奖励机制和基于多智能体近端策略优化的集中训练-分散执行方法。最后，通过实验验证了该方法在解决综合调度问题和动态事件时的优越性和有效性。该方法在不同的生产场景下表现出良好的适应性和灵活性，证明了多智能体深度强化学习在复杂动态环境中的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Integrated dynamic scheduling method for hybrid flow shop with machine preventive maintenance based on cooperative multi-agent deep reinforcement learning

Hybrid flow shop widely used in manufacturing industry is facing the challenge of complex and dynamic production environment. Current study mostly cannot consider the machine preventive maintenance and dynamic events in hybrid flow shop scheduling process. Therefore, this paper presents an integrated dynamic scheduling method for hybrid flow shop scheduling problem- unrelated parallel machine considering preventive maintenance (DHSFP-UPM-PM). And the multi-scheduling objectives include minimizing completion time, processing cost and maintenance cost. Firstly, the definition of research problem and the basic maintenance strategy are presented in detail. And an integrated mathematic model of DHSFP-UPM-PM is constructed. Then the integrated dynamic scheduling framework based on cooperative multi-agent deep reinforcement learning is proposed to solve the DHSFP-UPM-PM. Based on the above, we proposed a cooperative multi-processing stage agents (PSA) approach to realize the transformation from traditional single-agent to multi-agent. Meanwhile, the cooperative multi-agent Markova Decision Process is formulated to clarify the interaction between each agent and production environment. The state and action space as the key elements of scheduling model is also designed for each PSA. To optimize scheduling objectives, this paper further formulates new global reward mechanism and centralized training-decentralized execution method based on multi agent proximal policy optimization. Lastly, the experiment results verify the superiority and effectiveness of the proposed method when solving integrated scheduling problem and dynamic event. And the proposed method presents remarkable adaptability and flexibility under a different production scenario which prove the benefits of multi-agent deep reinforcement learning in complex and dynamic environment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Robotics and Computer-integrated Manufacturing 工程技术-工程：制造

CiteScore

24.10

自引率

13.50%

发文量

160

审稿时长

50 days

期刊介绍： The journal, Robotics and Computer-Integrated Manufacturing, focuses on sharing research applications that contribute to the development of new or enhanced robotics, manufacturing technologies, and innovative manufacturing strategies that are relevant to industry. Papers that combine theory and experimental validation are preferred, while review papers on current robotics and manufacturing issues are also considered. However, papers on traditional machining processes, modeling and simulation, supply chain management, and resource optimization are generally not within the scope of the journal, as there are more appropriate journals for these topics. Similarly, papers that are overly theoretical or mathematical will be directed to other suitable journals. The journal welcomes original papers in areas such as industrial robotics, human-robot collaboration in manufacturing, cloud-based manufacturing, cyber-physical production systems, big data analytics in manufacturing, smart mechatronics, machine learning, adaptive and sustainable manufacturing, and other fields involving unique manufacturing technologies.