{"title":"Addressing Different Goal Selection Strategies In Hindsight Experience Replay With Actor-Critic Methods For Robotic Hand Manipulation","authors":"Ayman Shams, Thomas Fevens","doi":"10.1109/RAAI56146.2022.10092979","DOIUrl":null,"url":null,"abstract":"One of the most challenging problems in reinforcement learning is dealing with minimal rewards obtained from an environment. We present a combined technique of Twin Delayed Deep Deterministic Policy Gradient known as TD3, an off-policy Reinforcement Learning algorithm with Hindsight Experience Replay (HER). This combined technique allows for sampleefficient learning from sparse and binary rewards and avoids the need for complicated reward engineering. We use the challenge of moving things with a robotic arm to illustrate our methodology. We specifically tested six different tasks: pushing, sliding, picking up and placing in the Fetch environment, as well as manipulating a block, an egg, or a pen with our hands. We solely use binary rewards every time to indicate whether or not a task has been performed. In a comparative study, we primarily concentrate on the impact of various goal selection strategies of HER replay butter on both DDPG and TD3. We discovered that HER was crucial in enabling training in these demanding situations.","PeriodicalId":190255,"journal":{"name":"2022 2nd International Conference on Robotics, Automation and Artificial Intelligence (RAAI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Robotics, Automation and Artificial Intelligence (RAAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RAAI56146.2022.10092979","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
One of the most challenging problems in reinforcement learning is dealing with minimal rewards obtained from an environment. We present a combined technique of Twin Delayed Deep Deterministic Policy Gradient known as TD3, an off-policy Reinforcement Learning algorithm with Hindsight Experience Replay (HER). This combined technique allows for sampleefficient learning from sparse and binary rewards and avoids the need for complicated reward engineering. We use the challenge of moving things with a robotic arm to illustrate our methodology. We specifically tested six different tasks: pushing, sliding, picking up and placing in the Fetch environment, as well as manipulating a block, an egg, or a pen with our hands. We solely use binary rewards every time to indicate whether or not a task has been performed. In a comparative study, we primarily concentrate on the impact of various goal selection strategies of HER replay butter on both DDPG and TD3. We discovered that HER was crucial in enabling training in these demanding situations.