Hybrid of representation learning and reinforcement learning for dynamic and complex robotic motion planning

IF 5.2 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Robotics and Autonomous Systems Pub Date : 2025-08-14 DOI:10.1016/j.robot.2025.105167

Chengmin Zhou , Xin Lu , Jiapeng Dai , Xiaoxu Liu , Bingding Huang , Pasi Fränti

{"title":"Hybrid of representation learning and reinforcement learning for dynamic and complex robotic motion planning","authors":"Chengmin Zhou , Xin Lu , Jiapeng Dai , Xiaoxu Liu , Bingding Huang , Pasi Fränti","doi":"10.1016/j.robot.2025.105167","DOIUrl":null,"url":null,"abstract":"<div><div>Motion planning is the soul of robot decision making. Classical planning algorithms like graph search and reaction-based algorithms face challenges in cases of dense and dynamic obstacles. Deep learning algorithms generate suboptimal one-step predictions that cause many collisions. Reinforcement learning algorithms generate optimal or near-optimal time-sequential predictions. However, they suffer from slow convergence, suboptimal converged results, and unstable training. This paper introduces a hybrid algorithm for robotic motion planning: <em><u>l</u>ong short-term memory</em> (LSTM) and <u>s</u>kip connection for <u>a</u>ttention-based <u>d</u>iscrete <u>s</u>oft <u>a</u>ctor <u>c</u>ritic (LSA-DSAC). First, graph network (relational graph) and attention network (attention weight) interpret the environmental state for the learning of the discrete soft actor critic algorithm. The expressive power of attention network outperforms that of graph in our task by difference analysis of these two representation methods. However, attention based DSAC faces the problem of unstable training (vanishing gradient). Second, the skip connection method is integrated to attention based DSAC to mitigate unstable training and improve convergence speed. Third, LSTM is taken to replace the sum operator of attention weigh and eliminate unstable training by slightly sacrificing convergence speed at early-stage training. Experiments show that LSA-DSAC outperforms the state-of-the-art in training and most evaluations. Physical robots are also implemented and tested in the real world.</div></div>","PeriodicalId":49592,"journal":{"name":"Robotics and Autonomous Systems","volume":"194 ","pages":"Article 105167"},"PeriodicalIF":5.2000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Autonomous Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0921889025002647","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Motion planning is the soul of robot decision making. Classical planning algorithms like graph search and reaction-based algorithms face challenges in cases of dense and dynamic obstacles. Deep learning algorithms generate suboptimal one-step predictions that cause many collisions. Reinforcement learning algorithms generate optimal or near-optimal time-sequential predictions. However, they suffer from slow convergence, suboptimal converged results, and unstable training. This paper introduces a hybrid algorithm for robotic motion planning: long short-term memory (LSTM) and skip connection for attention-based discrete soft actor critic (LSA-DSAC). First, graph network (relational graph) and attention network (attention weight) interpret the environmental state for the learning of the discrete soft actor critic algorithm. The expressive power of attention network outperforms that of graph in our task by difference analysis of these two representation methods. However, attention based DSAC faces the problem of unstable training (vanishing gradient). Second, the skip connection method is integrated to attention based DSAC to mitigate unstable training and improve convergence speed. Third, LSTM is taken to replace the sum operator of attention weigh and eliminate unstable training by slightly sacrificing convergence speed at early-stage training. Experiments show that LSA-DSAC outperforms the state-of-the-art in training and most evaluations. Physical robots are also implemented and tested in the real world.

查看原文本刊更多论文

混合表征学习和强化学习的动态复杂机器人运动规划

运动规划是机器人决策的灵魂。经典的规划算法，如图搜索和基于反应的算法，在密集和动态障碍物的情况下面临挑战。深度学习算法会产生次优的一步预测，导致许多碰撞。强化学习算法生成最优或接近最优的时间序列预测。然而，它们存在收敛速度慢、收敛结果不优、训练不稳定等问题。介绍了一种用于机器人运动规划的混合算法：基于注意的离散软演员评价（LSA-DSAC）的长短期记忆（LSTM）和跳跃连接。首先，图网络（关系图）和注意网络（注意权重）为离散软演员评价算法的学习解释环境状态。通过对两种表示方法的差异分析，注意网络的表达能力在我们的任务中优于图。然而，基于注意的DSAC存在训练不稳定（梯度消失）的问题。其次，将跳跃连接方法与基于关注的DSAC相结合，减少训练不稳定，提高收敛速度；第三，采用LSTM代替注意力权和算子，在训练初期以稍牺牲收敛速度的方式消除不稳定训练。实验表明，LSA-DSAC在训练和大多数评估中都优于最先进的技术。物理机器人也在现实世界中实现和测试。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Robotics and Autonomous Systems 工程技术-机器人学

CiteScore

9.00

自引率

7.00%

发文量

164

审稿时长

4.5 months

期刊介绍： Robotics and Autonomous Systems will carry articles describing fundamental developments in the field of robotics, with special emphasis on autonomous systems. An important goal of this journal is to extend the state of the art in both symbolic and sensory based robot control and learning in the context of autonomous systems. Robotics and Autonomous Systems will carry articles on the theoretical, computational and experimental aspects of autonomous systems, or modules of such systems.