针对轻型机器人的包含多步 "后见之明 "经验回放的强化学习路径规划方法

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays Pub Date : 2024-07-14 DOI:10.1016/j.displa.2024.102796

Jiaqi Wang, Huiyan Han, Xie Han, Liqun Kuang, Xiaowen Yang

{"title":"针对轻型机器人的包含多步 \"后见之明 \"经验回放的强化学习路径规划方法","authors":"Jiaqi Wang, Huiyan Han, Xie Han, Liqun Kuang, Xiaowen Yang","doi":"10.1016/j.displa.2024.102796","DOIUrl":null,"url":null,"abstract":"<div><p>Home service robots prioritize cost-effectiveness and convenience over the precision required for industrial tasks like autonomous driving, making their task execution more easily. Meanwhile, path planning tasks using Deep Reinforcement Learning(DRL) are commonly sparse reward problems with limited data utilization, posing challenges in obtaining meaningful rewards during training, consequently resulting in slow or challenging training. In response to these challenges, our paper introduces a lightweight end-to-end path planning algorithm employing with hindsight experience replay(HER). Initially, we optimize the reinforcement learning training process from scratch and map the complex high-dimensional action space and state space to the representative low-dimensional action space. At the same time, we improve the network structure to decouple the model navigation and obstacle avoidance module to meet the requirements of lightweight. Subsequently, we integrate HER and curriculum learning (CL) to tackle issues related to inefficient training. Additionally, we propose a multi-step hindsight experience replay (MS-HER) specifically for the path planning task, markedly enhancing both training efficiency and model generalization across diverse environments. To substantiate the enhanced training efficiency of the refined algorithm, we conducted tests within diverse Gazebo simulation environments. Results of the experiments reveal noteworthy enhancements in critical metrics, including success rate and training efficiency. To further ascertain the enhanced algorithm’s generalization capability, we evaluate its performance in some ”never-before-seen” simulation environment. Ultimately, we deploy the trained model onto a real lightweight robot for validation. The experimental outcomes indicate the model’s competence in successfully executing the path planning task, even on a small robot with constrained computational resources.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102796"},"PeriodicalIF":3.7000,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement learning path planning method incorporating multi-step Hindsight Experience Replay for lightweight robots\",\"authors\":\"Jiaqi Wang, Huiyan Han, Xie Han, Liqun Kuang, Xiaowen Yang\",\"doi\":\"10.1016/j.displa.2024.102796\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Home service robots prioritize cost-effectiveness and convenience over the precision required for industrial tasks like autonomous driving, making their task execution more easily. Meanwhile, path planning tasks using Deep Reinforcement Learning(DRL) are commonly sparse reward problems with limited data utilization, posing challenges in obtaining meaningful rewards during training, consequently resulting in slow or challenging training. In response to these challenges, our paper introduces a lightweight end-to-end path planning algorithm employing with hindsight experience replay(HER). Initially, we optimize the reinforcement learning training process from scratch and map the complex high-dimensional action space and state space to the representative low-dimensional action space. At the same time, we improve the network structure to decouple the model navigation and obstacle avoidance module to meet the requirements of lightweight. Subsequently, we integrate HER and curriculum learning (CL) to tackle issues related to inefficient training. Additionally, we propose a multi-step hindsight experience replay (MS-HER) specifically for the path planning task, markedly enhancing both training efficiency and model generalization across diverse environments. To substantiate the enhanced training efficiency of the refined algorithm, we conducted tests within diverse Gazebo simulation environments. Results of the experiments reveal noteworthy enhancements in critical metrics, including success rate and training efficiency. To further ascertain the enhanced algorithm’s generalization capability, we evaluate its performance in some ”never-before-seen” simulation environment. Ultimately, we deploy the trained model onto a real lightweight robot for validation. The experimental outcomes indicate the model’s competence in successfully executing the path planning task, even on a small robot with constrained computational resources.</p></div>\",\"PeriodicalId\":50570,\"journal\":{\"name\":\"Displays\",\"volume\":\"84 \",\"pages\":\"Article 102796\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Displays\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0141938224001604\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938224001604","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

与自动驾驶等工业任务所需的精度相比，家用服务机器人更注重成本效益和便利性，这使其更容易执行任务。与此同时，使用深度强化学习（DRL）的路径规划任务通常是数据利用率有限的稀疏奖励问题，在训练过程中难以获得有意义的奖励，从而导致训练速度缓慢或训练难度增加。为了应对这些挑战，我们的论文介绍了一种采用事后经验重放（HER）的轻量级端到端路径规划算法。首先，我们从头开始优化强化学习训练过程，将复杂的高维行动空间和状态空间映射到有代表性的低维行动空间。同时，我们改进了网络结构，将模型导航和避障模块解耦，以满足轻量级的要求。随后，我们整合了 HER 和课程学习（CL），以解决训练效率低下的相关问题。此外，我们还针对路径规划任务提出了多步骤后见经验重放（MS-HER），显著提高了训练效率和模型在不同环境下的泛化能力。为了证实改进算法提高了训练效率，我们在不同的 Gazebo 仿真环境中进行了测试。实验结果表明，成功率和训练效率等关键指标都有显著提高。为了进一步确定增强算法的泛化能力，我们在一些 "前所未见 "的模拟环境中对其性能进行了评估。最后，我们将训练好的模型部署到一个真正的轻型机器人上进行验证。实验结果表明，即使在计算资源有限的小型机器人上，该模型也能成功执行路径规划任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Reinforcement learning path planning method incorporating multi-step Hindsight Experience Replay for lightweight robots

Home service robots prioritize cost-effectiveness and convenience over the precision required for industrial tasks like autonomous driving, making their task execution more easily. Meanwhile, path planning tasks using Deep Reinforcement Learning(DRL) are commonly sparse reward problems with limited data utilization, posing challenges in obtaining meaningful rewards during training, consequently resulting in slow or challenging training. In response to these challenges, our paper introduces a lightweight end-to-end path planning algorithm employing with hindsight experience replay(HER). Initially, we optimize the reinforcement learning training process from scratch and map the complex high-dimensional action space and state space to the representative low-dimensional action space. At the same time, we improve the network structure to decouple the model navigation and obstacle avoidance module to meet the requirements of lightweight. Subsequently, we integrate HER and curriculum learning (CL) to tackle issues related to inefficient training. Additionally, we propose a multi-step hindsight experience replay (MS-HER) specifically for the path planning task, markedly enhancing both training efficiency and model generalization across diverse environments. To substantiate the enhanced training efficiency of the refined algorithm, we conducted tests within diverse Gazebo simulation environments. Results of the experiments reveal noteworthy enhancements in critical metrics, including success rate and training efficiency. To further ascertain the enhanced algorithm’s generalization capability, we evaluate its performance in some ”never-before-seen” simulation environment. Ultimately, we deploy the trained model onto a real lightweight robot for validation. The experimental outcomes indicate the model’s competence in successfully executing the path planning task, even on a small robot with constrained computational resources.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Displays 工程技术-工程：电子与电气

CiteScore

4.60

自引率

25.60%

发文量

138

审稿时长

92 days

期刊介绍： Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.