基于隐藏动作空间启发式软行为者评价的机器人长短期安全控制框架

IF 11.4 1区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Robotics and Computer-integrated Manufacturing Pub Date : 2025-09-08 DOI:10.1016/j.rcim.2025.103107

Yixuan Liang, Ze Wang, Yunan Wang, Jichuan Yu, Jizhou Yan, Shize Lin, Zhao Jin, Jiuru Lu, Chuxiong Hu

{"title":"基于隐藏动作空间启发式软行为者评价的机器人长短期安全控制框架","authors":"Yixuan Liang, Ze Wang, Yunan Wang, Jichuan Yu, Jizhou Yan, Shize Lin, Zhao Jin, Jiuru Lu, Chuxiong Hu","doi":"10.1016/j.rcim.2025.103107","DOIUrl":null,"url":null,"abstract":"<div><div>Real-time safety control for robots in dynamic environments is a critical and challenging problem in robotics. With the advent of intelligent manufacturing, the demand for advanced safety control technologies in robotics has steadily increased. Robot planning typically involves global and local approaches, but both face limitations in real-time safety control in dynamic environments with unknown obstacles. Recent hybrid frameworks have shown progress, but challenges persist, including limited perception capabilities and poor coordination between global and local components. To address these challenges, this work proposes a novel long short term safety control framework leveraging reinforcement learning for decision-making. Perception and planning are decoupled into long-term and short-term components, with long-term perception utilizing unsupervised clustering DBSCAN for structured environment information and short-term perception enhancing efficiency through prior knowledge. Long-term planning provides reference trajectories based on static environments, while short-term planning adjusts these trajectories in real time for local safety using control barrier functions. Based on hidden action heuristic soft actor–critic and curriculum learning, the decision-making mechanism ensures safety during obstacles or attacks and maximizes robot efficiency without compromising safety. Experiments are conducted with 10,000 randomized obstacle collision scenarios, and our framework is compared with four methods, including SAC and manually designed trajectory adjustment. The results demonstrate that our approach outperforms these methods in both safety performance and operational efficiency. Finally, the system is successfully implemented in a physical environment, showcasing its practical potential for real-world applications.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"98 ","pages":"Article 103107"},"PeriodicalIF":11.4000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Long short term robot safe control framework based on hidden action space heuristic soft actor–critic\",\"authors\":\"Yixuan Liang, Ze Wang, Yunan Wang, Jichuan Yu, Jizhou Yan, Shize Lin, Zhao Jin, Jiuru Lu, Chuxiong Hu\",\"doi\":\"10.1016/j.rcim.2025.103107\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Real-time safety control for robots in dynamic environments is a critical and challenging problem in robotics. With the advent of intelligent manufacturing, the demand for advanced safety control technologies in robotics has steadily increased. Robot planning typically involves global and local approaches, but both face limitations in real-time safety control in dynamic environments with unknown obstacles. Recent hybrid frameworks have shown progress, but challenges persist, including limited perception capabilities and poor coordination between global and local components. To address these challenges, this work proposes a novel long short term safety control framework leveraging reinforcement learning for decision-making. Perception and planning are decoupled into long-term and short-term components, with long-term perception utilizing unsupervised clustering DBSCAN for structured environment information and short-term perception enhancing efficiency through prior knowledge. Long-term planning provides reference trajectories based on static environments, while short-term planning adjusts these trajectories in real time for local safety using control barrier functions. Based on hidden action heuristic soft actor–critic and curriculum learning, the decision-making mechanism ensures safety during obstacles or attacks and maximizes robot efficiency without compromising safety. Experiments are conducted with 10,000 randomized obstacle collision scenarios, and our framework is compared with four methods, including SAC and manually designed trajectory adjustment. The results demonstrate that our approach outperforms these methods in both safety performance and operational efficiency. Finally, the system is successfully implemented in a physical environment, showcasing its practical potential for real-world applications.</div></div>\",\"PeriodicalId\":21452,\"journal\":{\"name\":\"Robotics and Computer-integrated Manufacturing\",\"volume\":\"98 \",\"pages\":\"Article 103107\"},\"PeriodicalIF\":11.4000,\"publicationDate\":\"2025-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Computer-integrated Manufacturing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0736584525001619\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Computer-integrated Manufacturing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0736584525001619","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

动态环境下机器人的实时安全控制是机器人技术中的一个关键和具有挑战性的问题。随着智能制造时代的到来，机器人领域对先进安全控制技术的需求稳步增长。机器人规划通常包括全局和局部方法，但在具有未知障碍物的动态环境中，两者在实时安全控制方面都存在局限性。最近的混合框架取得了进展，但挑战依然存在，包括感知能力有限以及全球和地方组成部分之间协调不力。为了应对这些挑战，本研究提出了一种利用强化学习进行决策的新型长短期安全控制框架。感知和规划被解耦为长期和短期组件，长期感知利用无监督聚类DBSCAN对结构化环境信息进行处理，短期感知利用先验知识提高效率。长期规划提供基于静态环境的参考轨迹，而短期规划使用控制屏障功能实时调整这些轨迹，以保证局部安全。该决策机制基于隐藏动作启发式软行为者批评和课程学习，在不影响安全的前提下，确保机器人在遇到障碍物或受到攻击时的安全，实现机器人效率的最大化。在10000个随机障碍物碰撞场景中进行了实验，并与SAC和人工设计轨迹调整四种方法进行了比较。结果表明，我们的方法在安全性能和操作效率方面都优于这些方法。最后，该系统在物理环境中成功实现，展示了其在实际应用中的实际潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Long short term robot safe control framework based on hidden action space heuristic soft actor–critic

Real-time safety control for robots in dynamic environments is a critical and challenging problem in robotics. With the advent of intelligent manufacturing, the demand for advanced safety control technologies in robotics has steadily increased. Robot planning typically involves global and local approaches, but both face limitations in real-time safety control in dynamic environments with unknown obstacles. Recent hybrid frameworks have shown progress, but challenges persist, including limited perception capabilities and poor coordination between global and local components. To address these challenges, this work proposes a novel long short term safety control framework leveraging reinforcement learning for decision-making. Perception and planning are decoupled into long-term and short-term components, with long-term perception utilizing unsupervised clustering DBSCAN for structured environment information and short-term perception enhancing efficiency through prior knowledge. Long-term planning provides reference trajectories based on static environments, while short-term planning adjusts these trajectories in real time for local safety using control barrier functions. Based on hidden action heuristic soft actor–critic and curriculum learning, the decision-making mechanism ensures safety during obstacles or attacks and maximizes robot efficiency without compromising safety. Experiments are conducted with 10,000 randomized obstacle collision scenarios, and our framework is compared with four methods, including SAC and manually designed trajectory adjustment. The results demonstrate that our approach outperforms these methods in both safety performance and operational efficiency. Finally, the system is successfully implemented in a physical environment, showcasing its practical potential for real-world applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Robotics and Computer-integrated Manufacturing 工程技术-工程：制造

CiteScore

24.10

自引率

13.50%

发文量

160

审稿时长

50 days

期刊介绍： The journal, Robotics and Computer-Integrated Manufacturing, focuses on sharing research applications that contribute to the development of new or enhanced robotics, manufacturing technologies, and innovative manufacturing strategies that are relevant to industry. Papers that combine theory and experimental validation are preferred, while review papers on current robotics and manufacturing issues are also considered. However, papers on traditional machining processes, modeling and simulation, supply chain management, and resource optimization are generally not within the scope of the journal, as there are more appropriate journals for these topics. Similarly, papers that are overly theoretical or mathematical will be directed to other suitable journals. The journal welcomes original papers in areas such as industrial robotics, human-robot collaboration in manufacturing, cloud-based manufacturing, cyber-physical production systems, big data analytics in manufacturing, smart mechatronics, machine learning, adaptive and sustainable manufacturing, and other fields involving unique manufacturing technologies.