Study on the application of single-agent and multi-agent reinforcement learning to dynamic scheduling in manufacturing environments with growing complexity: Case study on the synthesis of an industrial IoT Test Bed

IF 12.2 1区工程技术 Q1 ENGINEERING, INDUSTRIAL

Journal of Manufacturing Systems Pub Date : 2024-10-19 DOI:10.1016/j.jmsy.2024.09.019

David Heik, Fouad Bahrpeyma, Dirk Reichelt

{"title":"Study on the application of single-agent and multi-agent reinforcement learning to dynamic scheduling in manufacturing environments with growing complexity: Case study on the synthesis of an industrial IoT Test Bed","authors":"David Heik, Fouad Bahrpeyma, Dirk Reichelt","doi":"10.1016/j.jmsy.2024.09.019","DOIUrl":null,"url":null,"abstract":"<div><div>Industry 4.0, smart manufacturing and smart products have recently attracted substantial attention and are becoming increasingly prevalent in manufacturing systems. As a result of the successful implementation of these technologies, highly customized products can be manufactured using responsive, autonomous manufacturing processes at a competitive cost. This study was conducted at HTW Dresden’s Industrial Internet of Things Test Bed, which simulates state-of-the-art manufacturing scenarios for educational and research purposes. Apart from the physical production facility itself, the associated operational information systems have been fully interconnected in order to allow fast and efficient information exchange between the various manufacturing stages and systems. The presence of this characteristic provides a strong foundation for dealing appropriately with unexpected or planned environmental changes, as well as prevailing uncertainty, which greatly increases the overall system’s resilience. The main objective of this study is to increase the efficiency of the manufacturing system in order to optimize resource consumption and minimize the overall completion time (makespan). This manuscript discusses our experiments in the area of flexible job-shop scheduling problems (FJSP). As part of our research, different methods of representing the state space were explored, heuristic, meta-heuristic, reinforcement learning (RL), and multi-agent reinforcement learning (MARL) methods were evaluated, and various methods of interaction with the system (designing the action space and filtering in certain situations) were examined. Furthermore, the design of the reward function, which plays an important role in the formulation of the dynamic scheduling problem into an RL problem, has been discussed in depth. Finally, this paper studies the effectiveness of single-agent and multi-agent RL approaches, with a special focus on the Proximal Policy Optimization (PPO) method, on the fully-fledged digital twin of an industrial IoT system at HTW Dresden. As a result of our experiments, in a multi-agent setting involving individual agents for each manufacturing operation, PPO was able to manage the resources in such a way as to improve the manufacturing system’s performance significantly.</div></div>","PeriodicalId":16227,"journal":{"name":"Journal of Manufacturing Systems","volume":"77 ","pages":"Pages 525-557"},"PeriodicalIF":12.2000,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Manufacturing Systems","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0278612524002206","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}

引用次数: 0

Abstract

Industry 4.0, smart manufacturing and smart products have recently attracted substantial attention and are becoming increasingly prevalent in manufacturing systems. As a result of the successful implementation of these technologies, highly customized products can be manufactured using responsive, autonomous manufacturing processes at a competitive cost. This study was conducted at HTW Dresden’s Industrial Internet of Things Test Bed, which simulates state-of-the-art manufacturing scenarios for educational and research purposes. Apart from the physical production facility itself, the associated operational information systems have been fully interconnected in order to allow fast and efficient information exchange between the various manufacturing stages and systems. The presence of this characteristic provides a strong foundation for dealing appropriately with unexpected or planned environmental changes, as well as prevailing uncertainty, which greatly increases the overall system’s resilience. The main objective of this study is to increase the efficiency of the manufacturing system in order to optimize resource consumption and minimize the overall completion time (makespan). This manuscript discusses our experiments in the area of flexible job-shop scheduling problems (FJSP). As part of our research, different methods of representing the state space were explored, heuristic, meta-heuristic, reinforcement learning (RL), and multi-agent reinforcement learning (MARL) methods were evaluated, and various methods of interaction with the system (designing the action space and filtering in certain situations) were examined. Furthermore, the design of the reward function, which plays an important role in the formulation of the dynamic scheduling problem into an RL problem, has been discussed in depth. Finally, this paper studies the effectiveness of single-agent and multi-agent RL approaches, with a special focus on the Proximal Policy Optimization (PPO) method, on the fully-fledged digital twin of an industrial IoT system at HTW Dresden. As a result of our experiments, in a multi-agent setting involving individual agents for each manufacturing operation, PPO was able to manage the resources in such a way as to improve the manufacturing system’s performance significantly.

查看原文本刊更多论文

研究在复杂性不断增加的制造环境中，将单机和多机强化学习应用于动态调度：工业物联网测试平台综合案例研究

工业 4.0、智能制造和智能产品最近引起了广泛关注，并在制造系统中日益普及。由于这些技术的成功实施，高度定制化的产品可以通过反应灵敏、自主的制造流程以具有竞争力的成本生产出来。本研究在 HTW 德累斯顿工业物联网试验台进行，该试验台模拟最先进的制造场景，用于教育和研究目的。除了物理生产设施本身之外，相关的操作信息系统也已完全互联，以便在各个生产阶段和系统之间快速、高效地交换信息。这一特点为妥善处理意外或计划中的环境变化以及普遍存在的不确定性奠定了坚实的基础，从而大大提高了整个系统的应变能力。本研究的主要目标是提高制造系统的效率，以优化资源消耗并最大限度地缩短整体完工时间（makespan）。本手稿讨论了我们在灵活作业调度问题（FJSP）领域的实验。作为研究的一部分，我们探索了表示状态空间的不同方法，评估了启发式、元启发式、强化学习（RL）和多代理强化学习（MARL）方法，并研究了与系统交互的各种方法（设计行动空间和在某些情况下进行过滤）。此外，本文还深入讨论了在将动态调度问题表述为 RL 问题时起重要作用的奖励函数的设计。最后，本文在德累斯顿 HTW 工业物联网系统的成熟数字孪生系统上研究了单机和多机 RL 方法的有效性，并特别关注了近端策略优化 (PPO) 方法。实验结果表明，在涉及每个制造操作的单个代理的多代理环境中，PPO 能够以显著提高制造系统性能的方式管理资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Manufacturing Systems 工程技术-工程：工业

CiteScore

23.30

自引率

13.20%

发文量

216

审稿时长

25 days

期刊介绍： The Journal of Manufacturing Systems is dedicated to showcasing cutting-edge fundamental and applied research in manufacturing at the systems level. Encompassing products, equipment, people, information, control, and support functions, manufacturing systems play a pivotal role in the economical and competitive development, production, delivery, and total lifecycle of products, meeting market and societal needs. With a commitment to publishing archival scholarly literature, the journal strives to advance the state of the art in manufacturing systems and foster innovation in crafting efficient, robust, and sustainable manufacturing systems. The focus extends from equipment-level considerations to the broader scope of the extended enterprise. The Journal welcomes research addressing challenges across various scales, including nano, micro, and macro-scale manufacturing, and spanning diverse sectors such as aerospace, automotive, energy, and medical device manufacturing.