Sim2Real强化学习技术评价

The International FLAIRS Conference Proceedings Pub Date : 2023-05-08 DOI:10.32473/flairs.36.133317

Mahesh Ranaweera, Q. Mahmoud

{"title":"Sim2Real强化学习技术评价","authors":"Mahesh Ranaweera, Q. Mahmoud","doi":"10.32473/flairs.36.133317","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) has demonstrated promising results in transferring learned policies from simulation to real-world environments. However, inconsistencies and discrepancies between the two environments cause a negative transfer. The phenomenon is commonly known as the “reality gap.” The reality gap prevents learned policies from generalizing to the physical environment. This paper aims to evaluate techniques to improve sim2real learning and bridge the reality gap using RL. For this research, a 3-DOF Stewart Platform was built virtually and physically. The goal of the platform was to guide and balance the marble towards the center of the Stewart platform. Custom API was created to induce noise, manipulate in-game physics, dynamics, and lighting conditions, and perform domain randomization to improve generalization. Two RL algorithms; Q-Learning and Actor-Critic were implemented to train the agent and to evaluate the performance in bridging the reality gap. This paper outlines the techniques utilized to create noise, domain randomization, perform training, results, and observations. Overall, the obtained results show the effectiveness of domain randomization and inducing noise during the agents' learning process. Additionally, the findings provide valuable insights into implementing sim2real RL algorithms to bridge the reality gap.","PeriodicalId":302103,"journal":{"name":"The International FLAIRS Conference Proceedings","volume":"396 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of Techniques for Sim2Real Reinforcement Learning\",\"authors\":\"Mahesh Ranaweera, Q. Mahmoud\",\"doi\":\"10.32473/flairs.36.133317\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL) has demonstrated promising results in transferring learned policies from simulation to real-world environments. However, inconsistencies and discrepancies between the two environments cause a negative transfer. The phenomenon is commonly known as the “reality gap.” The reality gap prevents learned policies from generalizing to the physical environment. This paper aims to evaluate techniques to improve sim2real learning and bridge the reality gap using RL. For this research, a 3-DOF Stewart Platform was built virtually and physically. The goal of the platform was to guide and balance the marble towards the center of the Stewart platform. Custom API was created to induce noise, manipulate in-game physics, dynamics, and lighting conditions, and perform domain randomization to improve generalization. Two RL algorithms; Q-Learning and Actor-Critic were implemented to train the agent and to evaluate the performance in bridging the reality gap. This paper outlines the techniques utilized to create noise, domain randomization, perform training, results, and observations. Overall, the obtained results show the effectiveness of domain randomization and inducing noise during the agents' learning process. Additionally, the findings provide valuable insights into implementing sim2real RL algorithms to bridge the reality gap.\",\"PeriodicalId\":302103,\"journal\":{\"name\":\"The International FLAIRS Conference Proceedings\",\"volume\":\"396 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The International FLAIRS Conference Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32473/flairs.36.133317\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International FLAIRS Conference Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32473/flairs.36.133317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

强化学习(RL)在将学习到的策略从模拟转移到现实环境方面已经证明了有希望的结果。然而，两种环境之间的不一致和差异会导致负迁移。这种现象通常被称为“现实差距”。现实差距阻止了学到的策略推广到物理环境。本文旨在评估使用强化学习来改善模拟真实学习和弥合现实差距的技术。为此，对三自由度Stewart平台进行了虚拟和实体搭建。平台的目的是引导和平衡大理石到斯图尔特平台的中心。创建自定义API来诱导噪声，操纵游戏中的物理，动态和照明条件，并执行域随机化以提高泛化。两种强化学习算法;采用Q-Learning和Actor-Critic来训练智能体并评估其在弥合现实差距方面的表现。本文概述了用于创建噪声、域随机化、执行训练、结果和观察的技术。总的来说，得到的结果表明，在智能体学习过程中，领域随机化和诱导噪声是有效的。此外，研究结果为实现sim2real强化学习算法以弥合现实差距提供了有价值的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluation of Techniques for Sim2Real Reinforcement Learning

Reinforcement learning (RL) has demonstrated promising results in transferring learned policies from simulation to real-world environments. However, inconsistencies and discrepancies between the two environments cause a negative transfer. The phenomenon is commonly known as the “reality gap.” The reality gap prevents learned policies from generalizing to the physical environment. This paper aims to evaluate techniques to improve sim2real learning and bridge the reality gap using RL. For this research, a 3-DOF Stewart Platform was built virtually and physically. The goal of the platform was to guide and balance the marble towards the center of the Stewart platform. Custom API was created to induce noise, manipulate in-game physics, dynamics, and lighting conditions, and perform domain randomization to improve generalization. Two RL algorithms; Q-Learning and Actor-Critic were implemented to train the agent and to evaluate the performance in bridging the reality gap. This paper outlines the techniques utilized to create noise, domain randomization, perform training, results, and observations. Overall, the obtained results show the effectiveness of domain randomization and inducing noise during the agents' learning process. Additionally, the findings provide valuable insights into implementing sim2real RL algorithms to bridge the reality gap.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The International FLAIRS Conference Proceedings

自引率

0.00%

发文量