Effective Search for Control Hierarchies Within the Policy Decomposition Framework

IF 4.6 2区 计算机科学 Q2 ROBOTICS
Ashwin Khadke;Hartmut Geyer
{"title":"Effective Search for Control Hierarchies Within the Policy Decomposition Framework","authors":"Ashwin Khadke;Hartmut Geyer","doi":"10.1109/LRA.2024.3483635","DOIUrl":null,"url":null,"abstract":"Policy decomposition is a novel framework for approximating optimal control policies of complex dynamical systems with a hierarchy of policies derived from smaller but tractable subsystems. It stands out amongst the class of hierarchical control methods by estimating \n<italic>a priori</i>\n how well the closed-loop behavior of different control hierarchies matches the optimal policy. However, the number of possible hierarchies grows prohibitively with the number of inputs and the dimension of the state-space of the system making it unrealistic to estimate the closed-loop performance for all hierarchies. Here, we present the development of two search methods based on Genetic Algorithm and Monte-Carlo Tree Search to tackle this combinatorial challenge, and demonstrate that it is indeed surmountable. We showcase the efficacy of our search methods and the generality of the framework by applying it towards finding hierarchies for control of three distinct robotic systems: a simplified biped, a planar manipulator, and a quadcopter. The discovered hierarchies, in comparison to heuristically designed ones, provide improved closed-loop performance or can be computed in minimal time with marginally worse control performance, and also exceed the control performance of policies obtained with popular deep reinforcement learning methods.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11114-11121"},"PeriodicalIF":4.6000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10721360/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Policy decomposition is a novel framework for approximating optimal control policies of complex dynamical systems with a hierarchy of policies derived from smaller but tractable subsystems. It stands out amongst the class of hierarchical control methods by estimating a priori how well the closed-loop behavior of different control hierarchies matches the optimal policy. However, the number of possible hierarchies grows prohibitively with the number of inputs and the dimension of the state-space of the system making it unrealistic to estimate the closed-loop performance for all hierarchies. Here, we present the development of two search methods based on Genetic Algorithm and Monte-Carlo Tree Search to tackle this combinatorial challenge, and demonstrate that it is indeed surmountable. We showcase the efficacy of our search methods and the generality of the framework by applying it towards finding hierarchies for control of three distinct robotic systems: a simplified biped, a planar manipulator, and a quadcopter. The discovered hierarchies, in comparison to heuristically designed ones, provide improved closed-loop performance or can be computed in minimal time with marginally worse control performance, and also exceed the control performance of policies obtained with popular deep reinforcement learning methods.
在政策分解框架内有效搜索控制层次结构
策略分解是一种新颖的框架,用于用从较小但可控的子系统中衍生出的策略层次来近似复杂动态系统的最优控制策略。它通过预先估计不同控制层次的闭环行为与最优策略的匹配程度,在众多层次控制方法中脱颖而出。然而,可能的层次结构数量会随着输入数量和系统状态空间维度的增加而急剧增加,因此估算所有层次结构的闭环性能是不现实的。在此,我们介绍了基于遗传算法和蒙特卡洛树搜索的两种搜索方法,以应对这一组合挑战,并证明它确实是可以克服的。我们将搜索方法应用于为三个不同的机器人系统(简化的双足机器人、平面机械手和四旋翼飞行器)的控制寻找层次结构,从而展示了搜索方法的有效性和框架的通用性。与启发式设计的层次结构相比,所发现的层次结构提高了闭环性能,或者可以在控制性能略差的情况下用最少的时间计算出来,而且还超过了用流行的深度强化学习方法获得的策略的控制性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Robotics and Automation Letters
IEEE Robotics and Automation Letters Computer Science-Computer Science Applications
CiteScore
9.60
自引率
15.40%
发文量
1428
期刊介绍: The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信