学习高风险高精度运动控制

N. Kim, Markus Kirjonen, Perttu Hämäläinen
{"title":"学习高风险高精度运动控制","authors":"N. Kim, Markus Kirjonen, Perttu Hämäläinen","doi":"10.1145/3561975.3562943","DOIUrl":null,"url":null,"abstract":"Deep reinforcement learning (DRL) algorithms for movement control are typically evaluated and benchmarked on sequential decision tasks where imprecise actions may be corrected with later actions, thus allowing high returns with noisy actions. In contrast, we focus on an under-researched class of high-risk, high-precision motion control problems where actions carry irreversible outcomes, driving sharp peaks and ridges to plague the state-action reward landscape. Using computational pool as a representative example of such problems, we propose and evaluate State-Conditioned Shooting (SCOOT), a novel DRL algorithm that builds on advantage-weighted regression (AWR) with three key modifications: 1) Performing policy optimization only using elite samples, allowing the policy to better latch on to the rare high-reward action samples; 2) Utilizing a mixture-of-experts (MoE) policy, to allow switching between reward landscape modes depending on the state; 3) Adding a distance regularization term and a learning curriculum to encourage exploring diverse strategies before adapting to the most advantageous samples. We showcase our features’ performance in learning physically-based billiard shots demonstrating high action precision and discovering multiple shot strategies for a given ball configuration.","PeriodicalId":246179,"journal":{"name":"Proceedings of the 15th ACM SIGGRAPH Conference on Motion, Interaction and Games","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning High-Risk High-Precision Motion Control\",\"authors\":\"N. Kim, Markus Kirjonen, Perttu Hämäläinen\",\"doi\":\"10.1145/3561975.3562943\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep reinforcement learning (DRL) algorithms for movement control are typically evaluated and benchmarked on sequential decision tasks where imprecise actions may be corrected with later actions, thus allowing high returns with noisy actions. In contrast, we focus on an under-researched class of high-risk, high-precision motion control problems where actions carry irreversible outcomes, driving sharp peaks and ridges to plague the state-action reward landscape. Using computational pool as a representative example of such problems, we propose and evaluate State-Conditioned Shooting (SCOOT), a novel DRL algorithm that builds on advantage-weighted regression (AWR) with three key modifications: 1) Performing policy optimization only using elite samples, allowing the policy to better latch on to the rare high-reward action samples; 2) Utilizing a mixture-of-experts (MoE) policy, to allow switching between reward landscape modes depending on the state; 3) Adding a distance regularization term and a learning curriculum to encourage exploring diverse strategies before adapting to the most advantageous samples. We showcase our features’ performance in learning physically-based billiard shots demonstrating high action precision and discovering multiple shot strategies for a given ball configuration.\",\"PeriodicalId\":246179,\"journal\":{\"name\":\"Proceedings of the 15th ACM SIGGRAPH Conference on Motion, Interaction and Games\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 15th ACM SIGGRAPH Conference on Motion, Interaction and Games\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3561975.3562943\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th ACM SIGGRAPH Conference on Motion, Interaction and Games","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3561975.3562943","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

用于运动控制的深度强化学习(DRL)算法通常在顺序决策任务中进行评估和基准测试,其中不精确的操作可能会被后续操作纠正,从而允许高回报的噪声操作。相比之下,我们关注的是一类尚未得到充分研究的高风险、高精度运动控制问题,其中行动会带来不可逆转的结果,导致状态-行动奖励景观出现尖锐的峰值和山脊。以计算池作为此类问题的代表性示例,我们提出并评估了一种基于优势加权回归(AWR)的新型DRL算法——状态条件射击(SCOOT),该算法进行了三个关键修改:1)仅使用精英样本进行策略优化,使策略能够更好地锁定罕见的高回报行动样本;2)利用混合专家(MoE)政策,允许根据状态在奖励景观模式之间切换;3)增加距离正则化项和学习课程,以鼓励在适应最有利样本之前探索多种策略。我们展示了我们的特征在学习基于物理的台球射击方面的表现,展示了高动作精度,并发现了给定球配置的多种射击策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Learning High-Risk High-Precision Motion Control
Deep reinforcement learning (DRL) algorithms for movement control are typically evaluated and benchmarked on sequential decision tasks where imprecise actions may be corrected with later actions, thus allowing high returns with noisy actions. In contrast, we focus on an under-researched class of high-risk, high-precision motion control problems where actions carry irreversible outcomes, driving sharp peaks and ridges to plague the state-action reward landscape. Using computational pool as a representative example of such problems, we propose and evaluate State-Conditioned Shooting (SCOOT), a novel DRL algorithm that builds on advantage-weighted regression (AWR) with three key modifications: 1) Performing policy optimization only using elite samples, allowing the policy to better latch on to the rare high-reward action samples; 2) Utilizing a mixture-of-experts (MoE) policy, to allow switching between reward landscape modes depending on the state; 3) Adding a distance regularization term and a learning curriculum to encourage exploring diverse strategies before adapting to the most advantageous samples. We showcase our features’ performance in learning physically-based billiard shots demonstrating high action precision and discovering multiple shot strategies for a given ball configuration.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信