基于启发式Petri网驱动的深度强化学习增强机器人作业车间的实时调度

IF 11.4 1区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Robotics and Computer-integrated Manufacturing Pub Date : 2025-08-06 DOI:10.1016/j.rcim.2025.103097

Sijia Yi , Jiliang Luo

{"title":"基于启发式Petri网驱动的深度强化学习增强机器人作业车间的实时调度","authors":"Sijia Yi , Jiliang Luo","doi":"10.1016/j.rcim.2025.103097","DOIUrl":null,"url":null,"abstract":"<div><div>In robotic job shops (RJS), a significant challenge lies in optimizing task allocation and robot routing simultaneously, especially since these tasks must be accomplished in real-time to efficiently manage unexpected situations, such as the urgent need for AGV recharging or sudden order additions. Deep reinforcement learning (DRL) shows promise for these complex scheduling tasks due to its ability to address problems characterized by substantial computational complexity. However, the rapid expansion of RJS state space and the difficulty of avoiding cyclic loops for AGVs pose significant challenges for DRL in realistic settings. To address these, we present a novel approach combining an artificial-potential-field (APF) with a deep Q-network (DQN) in a Petri net framework. The APF is designed for Petri nets to guide token movement toward goal place nodes. Throughout the learning process, the APF-guided mixed policy employs a cosine-annealing probability for APF policy and a piecewise linear probability for random policy. Initially, action selections predominantly rely on APF policy to efficiently gather high-reward experience. As training progresses, they shifts to more rely on the learned neural-network policy, with random exploration supplementing diversity, ensuring a robust transition from reward-driven exploration to precise decision-making. The APF-DQN method is tested in real-world RJS scenarios, showing superior exploration success and training efficiency over baseline DQN. It significantly outperforms both conventional dispatching rules and baseline DQN, reducing average makespan by over 55% compared to dispatching rules and by 14.9% relative to baseline DQN. This method significantly enhances traditional DQN by improving exploration success, learning efficiency, policy convergence, and adaptability to dynamic environments.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"97 ","pages":"Article 103097"},"PeriodicalIF":11.4000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep reinforcement learning driven by heuristics with Petri nets for enhancing real-time scheduling in robotic job shops\",\"authors\":\"Sijia Yi , Jiliang Luo\",\"doi\":\"10.1016/j.rcim.2025.103097\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In robotic job shops (RJS), a significant challenge lies in optimizing task allocation and robot routing simultaneously, especially since these tasks must be accomplished in real-time to efficiently manage unexpected situations, such as the urgent need for AGV recharging or sudden order additions. Deep reinforcement learning (DRL) shows promise for these complex scheduling tasks due to its ability to address problems characterized by substantial computational complexity. However, the rapid expansion of RJS state space and the difficulty of avoiding cyclic loops for AGVs pose significant challenges for DRL in realistic settings. To address these, we present a novel approach combining an artificial-potential-field (APF) with a deep Q-network (DQN) in a Petri net framework. The APF is designed for Petri nets to guide token movement toward goal place nodes. Throughout the learning process, the APF-guided mixed policy employs a cosine-annealing probability for APF policy and a piecewise linear probability for random policy. Initially, action selections predominantly rely on APF policy to efficiently gather high-reward experience. As training progresses, they shifts to more rely on the learned neural-network policy, with random exploration supplementing diversity, ensuring a robust transition from reward-driven exploration to precise decision-making. The APF-DQN method is tested in real-world RJS scenarios, showing superior exploration success and training efficiency over baseline DQN. It significantly outperforms both conventional dispatching rules and baseline DQN, reducing average makespan by over 55% compared to dispatching rules and by 14.9% relative to baseline DQN. This method significantly enhances traditional DQN by improving exploration success, learning efficiency, policy convergence, and adaptability to dynamic environments.</div></div>\",\"PeriodicalId\":21452,\"journal\":{\"name\":\"Robotics and Computer-integrated Manufacturing\",\"volume\":\"97 \",\"pages\":\"Article 103097\"},\"PeriodicalIF\":11.4000,\"publicationDate\":\"2025-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Computer-integrated Manufacturing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0736584525001516\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Computer-integrated Manufacturing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0736584525001516","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

在机器人作业车间（RJS）中，一个重要的挑战在于同时优化任务分配和机器人路线，特别是因为这些任务必须实时完成，以有效地管理意外情况，例如AGV迫切需要充电或突然增加订单。由于深度强化学习（DRL）能够解决以大量计算复杂性为特征的问题，因此它在这些复杂的调度任务中表现出了希望。然而，在现实环境中，RJS状态空间的快速扩展和agv避免循环循环的困难给DRL带来了重大挑战。为了解决这些问题，我们提出了一种在Petri网框架中结合人工势场（APF）和深度q网络（DQN）的新方法。APF是为Petri网设计的，用于引导令牌向目标位置节点移动。在整个学习过程中，APF引导的混合策略对APF策略采用余弦退火概率，对随机策略采用分段线性概率。最初，行动选择主要依赖于APF策略来有效地收集高奖励经验。随着训练的进行，他们转向更多地依赖于学习的神经网络策略，随机探索补充多样性，确保从奖励驱动的探索到精确决策的稳健过渡。APF-DQN方法在真实的RJS场景中进行了测试，显示出优于基线DQN的勘探成功率和训练效率。它明显优于传统调度规则和基线DQN，与调度规则相比，平均完工时间减少了55%以上，与基线DQN相比，平均完工时间减少了14.9%。该方法通过提高探索成功率、学习效率、策略收敛性和对动态环境的适应性，显著增强了传统DQN。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep reinforcement learning driven by heuristics with Petri nets for enhancing real-time scheduling in robotic job shops

In robotic job shops (RJS), a significant challenge lies in optimizing task allocation and robot routing simultaneously, especially since these tasks must be accomplished in real-time to efficiently manage unexpected situations, such as the urgent need for AGV recharging or sudden order additions. Deep reinforcement learning (DRL) shows promise for these complex scheduling tasks due to its ability to address problems characterized by substantial computational complexity. However, the rapid expansion of RJS state space and the difficulty of avoiding cyclic loops for AGVs pose significant challenges for DRL in realistic settings. To address these, we present a novel approach combining an artificial-potential-field (APF) with a deep Q-network (DQN) in a Petri net framework. The APF is designed for Petri nets to guide token movement toward goal place nodes. Throughout the learning process, the APF-guided mixed policy employs a cosine-annealing probability for APF policy and a piecewise linear probability for random policy. Initially, action selections predominantly rely on APF policy to efficiently gather high-reward experience. As training progresses, they shifts to more rely on the learned neural-network policy, with random exploration supplementing diversity, ensuring a robust transition from reward-driven exploration to precise decision-making. The APF-DQN method is tested in real-world RJS scenarios, showing superior exploration success and training efficiency over baseline DQN. It significantly outperforms both conventional dispatching rules and baseline DQN, reducing average makespan by over 55% compared to dispatching rules and by 14.9% relative to baseline DQN. This method significantly enhances traditional DQN by improving exploration success, learning efficiency, policy convergence, and adaptability to dynamic environments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Robotics and Computer-integrated Manufacturing 工程技术-工程：制造

CiteScore

24.10

自引率

13.50%

发文量

160

审稿时长

50 days

期刊介绍： The journal, Robotics and Computer-Integrated Manufacturing, focuses on sharing research applications that contribute to the development of new or enhanced robotics, manufacturing technologies, and innovative manufacturing strategies that are relevant to industry. Papers that combine theory and experimental validation are preferred, while review papers on current robotics and manufacturing issues are also considered. However, papers on traditional machining processes, modeling and simulation, supply chain management, and resource optimization are generally not within the scope of the journal, as there are more appropriate journals for these topics. Similarly, papers that are overly theoretical or mathematical will be directed to other suitable journals. The journal welcomes original papers in areas such as industrial robotics, human-robot collaboration in manufacturing, cloud-based manufacturing, cyber-physical production systems, big data analytics in manufacturing, smart mechatronics, machine learning, adaptive and sustainable manufacturing, and other fields involving unique manufacturing technologies.