An online hyper-volume action bounding approach for accelerating the process of deep reinforcement learning from multiple controllers

IF 4.2 2区 计算机科学 Q2 ROBOTICS
Ali Aflakian, Alireza Rastegarpanah, Jamie Hathaway, Rustam Stolkin
{"title":"An online hyper-volume action bounding approach for accelerating the process of deep reinforcement learning from multiple controllers","authors":"Ali Aflakian,&nbsp;Alireza Rastegarpanah,&nbsp;Jamie Hathaway,&nbsp;Rustam Stolkin","doi":"10.1002/rob.22355","DOIUrl":null,"url":null,"abstract":"<p>This paper fuses ideas from reinforcement learning (RL), Learning from Demonstration (LfD), and Ensemble Learning into a single paradigm. Knowledge from a mixture of control algorithms (experts) are used to constrain the action space of the agent, enabling faster RL refining of a control policy, by avoiding unnecessary explorative actions. Domain-specific knowledge of each expert is exploited. However, the resulting policy is robust against errors of individual experts, since it is refined by a RL reward function without copying any particular demonstration. Our method has the potential to supplement existing RLfD methods when multiple algorithmic approaches are available to function as experts, specifically in tasks involving continuous action spaces. We illustrate our method in the context of a visual servoing (VS) task, in which a 7-DoF robot arm is controlled to maintain a desired pose relative to a target object. We explore four methods for bounding the actions of the RL agent during training. These methods include using a hypercube and convex hull with modified loss functions, ignoring actions outside the convex hull, and projecting actions onto the convex hull. We compare the training progress of each method using expert demonstrators, employing one expert demonstrator with the DAgger algorithm, and without using any demonstrators. Our experiments show that using the convex hull with a modified loss function not only accelerates learning but also provides the most optimal solution compared with other approaches. Furthermore, we demonstrate faster VS error convergence while maintaining higher manipulability of the arm, compared with classical image-based VS, position-based VS, and hybrid-decoupled VS.</p>","PeriodicalId":192,"journal":{"name":"Journal of Field Robotics","volume":"41 6","pages":"1814-1828"},"PeriodicalIF":4.2000,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/rob.22355","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Field Robotics","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/rob.22355","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

Abstract

This paper fuses ideas from reinforcement learning (RL), Learning from Demonstration (LfD), and Ensemble Learning into a single paradigm. Knowledge from a mixture of control algorithms (experts) are used to constrain the action space of the agent, enabling faster RL refining of a control policy, by avoiding unnecessary explorative actions. Domain-specific knowledge of each expert is exploited. However, the resulting policy is robust against errors of individual experts, since it is refined by a RL reward function without copying any particular demonstration. Our method has the potential to supplement existing RLfD methods when multiple algorithmic approaches are available to function as experts, specifically in tasks involving continuous action spaces. We illustrate our method in the context of a visual servoing (VS) task, in which a 7-DoF robot arm is controlled to maintain a desired pose relative to a target object. We explore four methods for bounding the actions of the RL agent during training. These methods include using a hypercube and convex hull with modified loss functions, ignoring actions outside the convex hull, and projecting actions onto the convex hull. We compare the training progress of each method using expert demonstrators, employing one expert demonstrator with the DAgger algorithm, and without using any demonstrators. Our experiments show that using the convex hull with a modified loss function not only accelerates learning but also provides the most optimal solution compared with other approaches. Furthermore, we demonstrate faster VS error convergence while maintaining higher manipulability of the arm, compared with classical image-based VS, position-based VS, and hybrid-decoupled VS.

Abstract Image

用于加速从多个控制器进行深度强化学习的在线超体积行动界限法
本文将强化学习(RL)、示范学习(LfD)和集合学习(Ensemble Learning)的理念融合到一个单一的范例中。来自混合控制算法(专家)的知识被用来限制代理的行动空间,从而通过避免不必要的探索性行动,更快地对控制策略进行 RL 精炼。每个专家的特定领域知识都得到了利用。不过,由此产生的政策对单个专家的错误具有鲁棒性,因为它是通过 RL 奖励函数完善的,而不会复制任何特定的示范。当有多种算法方法可作为专家发挥作用时,我们的方法有可能补充现有的 RLfD 方法,特别是在涉及连续行动空间的任务中。我们以视觉伺服(VS)任务为背景说明了我们的方法,在该任务中,一个 7-DoF 机械臂被控制以保持相对于目标物体的理想姿势。在训练过程中,我们探索了四种限定 RL 代理动作的方法。这些方法包括使用带有修正损失函数的超立方体和凸壳、忽略凸壳外的动作以及将动作投影到凸壳上。我们比较了每种方法的训练进度,包括使用专家演示器、使用一个专家演示器和 DAgger 算法,以及不使用任何演示器。我们的实验表明,与其他方法相比,使用带有修正损失函数的凸壳不仅能加快学习速度,还能提供最优解。此外,与经典的基于图像的 VS、基于位置的 VS 和混合解耦 VS 相比,我们展示了更快的 VS 误差收敛速度,同时保持了手臂更高的可操作性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Field Robotics
Journal of Field Robotics 工程技术-机器人学
CiteScore
15.00
自引率
3.60%
发文量
80
审稿时长
6 months
期刊介绍: The Journal of Field Robotics seeks to promote scholarly publications dealing with the fundamentals of robotics in unstructured and dynamic environments. The Journal focuses on experimental robotics and encourages publication of work that has both theoretical and practical significance.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信