ETQ-learning: 一种改进的路径规划 Q-learning 算法

IF 4.3 4区计算机科学 Q3 ROBOTICS

Intelligent Service Robotics Pub Date : 2024-06-26 DOI:10.1007/s11370-024-00544-3

Huanwei Wang, Jing Jing, Qianlv Wang, Hongqi He, Xuyan Qi, Rui Lou

{"title":"ETQ-learning: 一种改进的路径规划 Q-learning 算法","authors":"Huanwei Wang, Jing Jing, Qianlv Wang, Hongqi He, Xuyan Qi, Rui Lou","doi":"10.1007/s11370-024-00544-3","DOIUrl":null,"url":null,"abstract":"Path planning algorithm has always been the core of intelligent robot research; a good path planning algorithm can significantly enhance the efficiency of robots in executing tasks. As the application scenarios for intelligent robots continue to diversify, their adaptability to the environment has become a key focus in current path planning algorithm research. As one of the classic reinforcement learning algorithms, Q-learning (QL) algorithm has its inherent advantages in adapting to the environment, but it also faces various challenges and shortcomings. These issues are primarily centered around suboptimal path planning, slow convergence speed, weak generalization capability and poor obstacle avoidance performance. In order to solve these issues in the QL algorithm, we have carried out the following work. (1) We redesign the reward mechanism of QL algorithm. The traditional Q-learning algorithm’s reward mechanism is simple to implement but lacks directionality. We propose a combined reward mechanism of \"static assignment + dynamic adjustment.\" This mechanism can address the issue of random path selection and ultimately lead to optimal path planning. (2) We redesign the greedy strategy of QL algorithm. In the traditional Q-learning algorithm, the greedy factor in the strategy is either randomly generated or set manually, which limits its applicability to some extent. It is difficult to effectively applied to different physical environments and scenarios, which is the fundamental reason for the poor generalization capability of the algorithm. We propose a dynamic adjustment of the greedy factor, known as the \\(\\varepsilon -acc-increasing\\) greedy strategy, which significantly improves the efficiency of Q-learning algorithm and enhances its generalization capability so that the algorithm has a wider range of application scenarios. (3) We introduce a concept to enhance the algorithm’s obstacle avoidance performance. We design the expansion distance, which pre-sets a \"collision buffer\" between the obstacle and agent to enhance the algorithm’s obstacle avoidance performance.\n","PeriodicalId":48813,"journal":{"name":"Intelligent Service Robotics","volume":"172 1","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ETQ-learning: an improved Q-learning algorithm for path planning\",\"authors\":\"Huanwei Wang, Jing Jing, Qianlv Wang, Hongqi He, Xuyan Qi, Rui Lou\",\"doi\":\"10.1007/s11370-024-00544-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Path planning algorithm has always been the core of intelligent robot research; a good path planning algorithm can significantly enhance the efficiency of robots in executing tasks. As the application scenarios for intelligent robots continue to diversify, their adaptability to the environment has become a key focus in current path planning algorithm research. As one of the classic reinforcement learning algorithms, Q-learning (QL) algorithm has its inherent advantages in adapting to the environment, but it also faces various challenges and shortcomings. These issues are primarily centered around suboptimal path planning, slow convergence speed, weak generalization capability and poor obstacle avoidance performance. In order to solve these issues in the QL algorithm, we have carried out the following work. (1) We redesign the reward mechanism of QL algorithm. The traditional Q-learning algorithm’s reward mechanism is simple to implement but lacks directionality. We propose a combined reward mechanism of \\\"static assignment + dynamic adjustment.\\\" This mechanism can address the issue of random path selection and ultimately lead to optimal path planning. (2) We redesign the greedy strategy of QL algorithm. In the traditional Q-learning algorithm, the greedy factor in the strategy is either randomly generated or set manually, which limits its applicability to some extent. It is difficult to effectively applied to different physical environments and scenarios, which is the fundamental reason for the poor generalization capability of the algorithm. We propose a dynamic adjustment of the greedy factor, known as the \\\\(\\\\varepsilon -acc-increasing\\\\) greedy strategy, which significantly improves the efficiency of Q-learning algorithm and enhances its generalization capability so that the algorithm has a wider range of application scenarios. (3) We introduce a concept to enhance the algorithm’s obstacle avoidance performance. We design the expansion distance, which pre-sets a \\\"collision buffer\\\" between the obstacle and agent to enhance the algorithm’s obstacle avoidance performance.\\n\",\"PeriodicalId\":48813,\"journal\":{\"name\":\"Intelligent Service Robotics\",\"volume\":\"172 1\",\"pages\":\"\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent Service Robotics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11370-024-00544-3\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Service Robotics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11370-024-00544-3","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

摘要

路径规划算法一直是智能机器人研究的核心，好的路径规划算法能显著提高机器人执行任务的效率。随着智能机器人应用场景的不断丰富，其对环境的适应性成为当前路径规划算法研究的重点。作为经典的强化学习算法之一，Q-learning（QL）算法在适应环境方面有其固有的优势，但也面临着各种挑战和不足。这些问题主要集中在次优路径规划、收敛速度慢、泛化能力弱以及避障性能差等方面。为了解决 QL 算法中的这些问题，我们开展了以下工作。(1）重新设计 QL 算法的奖励机制。传统 Q-learning 算法的奖励机制实现简单，但缺乏方向性。我们提出了 "静态分配+动态调整 "的组合奖励机制。这种机制可以解决随机路径选择的问题，并最终实现最优路径规划。(2) 我们重新设计了 QL 算法的贪婪策略。在传统的 Q-learning 算法中，策略中的贪婪因子是随机生成或手动设置的，这在一定程度上限制了其适用性。它很难有效地应用于不同的物理环境和场景，这也是该算法泛化能力差的根本原因。我们提出了一种动态调整贪婪因子的策略，即\(\varepsilon -acc-increasing\)贪婪策略，大大提高了Q-learning算法的效率，增强了其泛化能力，使该算法具有更广泛的应用场景。(3) 我们引入了一个增强算法避障性能的概念。我们设计了扩展距离，在障碍物和机器人之间预设了一个 "碰撞缓冲区"，以提高算法的避障性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

ETQ-learning: an improved Q-learning algorithm for path planning

查看原文本刊更多论文

ETQ-learning: an improved Q-learning algorithm for path planning

Path planning algorithm has always been the core of intelligent robot research; a good path planning algorithm can significantly enhance the efficiency of robots in executing tasks. As the application scenarios for intelligent robots continue to diversify, their adaptability to the environment has become a key focus in current path planning algorithm research. As one of the classic reinforcement learning algorithms, Q-learning (QL) algorithm has its inherent advantages in adapting to the environment, but it also faces various challenges and shortcomings. These issues are primarily centered around suboptimal path planning, slow convergence speed, weak generalization capability and poor obstacle avoidance performance. In order to solve these issues in the QL algorithm, we have carried out the following work. (1) We redesign the reward mechanism of QL algorithm. The traditional Q-learning algorithm’s reward mechanism is simple to implement but lacks directionality. We propose a combined reward mechanism of "static assignment + dynamic adjustment." This mechanism can address the issue of random path selection and ultimately lead to optimal path planning. (2) We redesign the greedy strategy of QL algorithm. In the traditional Q-learning algorithm, the greedy factor in the strategy is either randomly generated or set manually, which limits its applicability to some extent. It is difficult to effectively applied to different physical environments and scenarios, which is the fundamental reason for the poor generalization capability of the algorithm. We propose a dynamic adjustment of the greedy factor, known as the \(\varepsilon -acc-increasing\) greedy strategy, which significantly improves the efficiency of Q-learning algorithm and enhances its generalization capability so that the algorithm has a wider range of application scenarios. (3) We introduce a concept to enhance the algorithm’s obstacle avoidance performance. We design the expansion distance, which pre-sets a "collision buffer" between the obstacle and agent to enhance the algorithm’s obstacle avoidance performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Intelligent Service Robotics ROBOTICS-

CiteScore

5.70

自引率

4.00%

发文量

期刊介绍： The journal directs special attention to the emerging significance of integrating robotics with information technology and cognitive science (such as ubiquitous and adaptive computing,information integration in a distributed environment, and cognitive modelling for human-robot interaction), which spurs innovation toward a new multi-dimensional robotic service to humans. The journal intends to capture and archive this emerging yet significant advancement in the field of intelligent service robotics. The journal will publish original papers of innovative ideas and concepts, new discoveries and improvements, as well as novel applications and business models which are related to the field of intelligent service robotics described above and are proven to be of high quality. The areas that the Journal will cover include, but are not limited to: Intelligent robots serving humans in daily life or in a hazardous environment, such as home or personal service robots, entertainment robots, education robots, medical robots, healthcare and rehabilitation robots, and rescue robots (Service Robotics); Intelligent robotic functions in the form of embedded systems for applications to, for example, intelligent space, intelligent vehicles and transportation systems, intelligent manufacturing systems, and intelligent medical facilities (Embedded Robotics); The integration of robotics with network technologies, generating such services and solutions as distributed robots, distance robotic education-aides, and virtual laboratories or museums (Networked Robotics).