在博弈设置中学习高级策略的最优参数化策略

Ravi Prakash, Mohit Vohra, L. Behera
{"title":"在博弈设置中学习高级策略的最优参数化策略","authors":"Ravi Prakash, Mohit Vohra, L. Behera","doi":"10.1109/RO-MAN46459.2019.8956383","DOIUrl":null,"url":null,"abstract":"Complex and interactive robot manipulation skills such as playing a game of table tennis against a human opponent is a multifaceted challenge and a novel problem. Accurate dynamic trajectory generation in such dynamic situations and an appropriate controller in order to respond to the incoming table tennis ball from the opponent is only a prerequisite to win the game. Decision making is a major part of an intelligent robot and a policy is needed to choose and execute the action which receives highest reward. In this paper, we address this very important problem on how to learn the higher level optimal strategies that enable competitive behaviour with humans in such an interactive game setting. This paper presents a novel technique to learn a higher level strategy for the game of table tennis using P-Q Learning (a mixture of Pavlovian learning and Q-learning) to learn a parameterized policy. The cooperative learning framework of Kohenon Self Organizing Map (KSOM) along with Replay Memory is employed for faster strategy learning in this short horizon problem. The strategy is learnt in simulation, using a simulated human opponent and an ideal robot that can perform hitting motion in its workspace accurately. We show that our method is able to improve the average received reward significantly in comparison to the other state-of-the-art methods.","PeriodicalId":286478,"journal":{"name":"2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Learning Optimal Parameterized Policy for High Level Strategies in a Game Setting\",\"authors\":\"Ravi Prakash, Mohit Vohra, L. Behera\",\"doi\":\"10.1109/RO-MAN46459.2019.8956383\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Complex and interactive robot manipulation skills such as playing a game of table tennis against a human opponent is a multifaceted challenge and a novel problem. Accurate dynamic trajectory generation in such dynamic situations and an appropriate controller in order to respond to the incoming table tennis ball from the opponent is only a prerequisite to win the game. Decision making is a major part of an intelligent robot and a policy is needed to choose and execute the action which receives highest reward. In this paper, we address this very important problem on how to learn the higher level optimal strategies that enable competitive behaviour with humans in such an interactive game setting. This paper presents a novel technique to learn a higher level strategy for the game of table tennis using P-Q Learning (a mixture of Pavlovian learning and Q-learning) to learn a parameterized policy. The cooperative learning framework of Kohenon Self Organizing Map (KSOM) along with Replay Memory is employed for faster strategy learning in this short horizon problem. The strategy is learnt in simulation, using a simulated human opponent and an ideal robot that can perform hitting motion in its workspace accurately. We show that our method is able to improve the average received reward significantly in comparison to the other state-of-the-art methods.\",\"PeriodicalId\":286478,\"journal\":{\"name\":\"2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RO-MAN46459.2019.8956383\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RO-MAN46459.2019.8956383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

复杂的交互式机器人操作技能,如与人类对手打乒乓球,是一个多方面的挑战和新问题。在这种动态情况下,准确的动态轨迹生成和合适的控制器以响应来自对手的乒乓球,只是赢得比赛的先决条件。决策是智能机器人的一个重要组成部分,需要一个策略来选择和执行获得最高回报的行为。在这篇论文中,我们讨论了一个非常重要的问题,即如何学习更高层次的最优策略,从而在这样一个互动游戏环境中与人类竞争。本文提出了一种利用P-Q学习(巴甫洛夫学习和q -学习的混合)学习参数化策略的新技术来学习乒乓球比赛的高级策略。采用Kohenon自组织映射(KSOM)的合作学习框架和重播记忆(Replay Memory)来实现短视界问题的快速策略学习。该策略是在模拟中学习的,使用一个模拟的人类对手和一个理想的机器人,可以在其工作空间中准确地执行击球动作。我们表明,与其他最先进的方法相比,我们的方法能够显著提高平均收到的奖励。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Learning Optimal Parameterized Policy for High Level Strategies in a Game Setting
Complex and interactive robot manipulation skills such as playing a game of table tennis against a human opponent is a multifaceted challenge and a novel problem. Accurate dynamic trajectory generation in such dynamic situations and an appropriate controller in order to respond to the incoming table tennis ball from the opponent is only a prerequisite to win the game. Decision making is a major part of an intelligent robot and a policy is needed to choose and execute the action which receives highest reward. In this paper, we address this very important problem on how to learn the higher level optimal strategies that enable competitive behaviour with humans in such an interactive game setting. This paper presents a novel technique to learn a higher level strategy for the game of table tennis using P-Q Learning (a mixture of Pavlovian learning and Q-learning) to learn a parameterized policy. The cooperative learning framework of Kohenon Self Organizing Map (KSOM) along with Replay Memory is employed for faster strategy learning in this short horizon problem. The strategy is learnt in simulation, using a simulated human opponent and an ideal robot that can perform hitting motion in its workspace accurately. We show that our method is able to improve the average received reward significantly in comparison to the other state-of-the-art methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信