Sim2Real强化学习技术评价

Mahesh Ranaweera, Q. Mahmoud
{"title":"Sim2Real强化学习技术评价","authors":"Mahesh Ranaweera, Q. Mahmoud","doi":"10.32473/flairs.36.133317","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) has demonstrated promising results in transferring learned policies from simulation to real-world environments. However, inconsistencies and discrepancies between the two environments cause a negative transfer. The phenomenon is commonly known as the “reality gap.” The reality gap prevents learned policies from generalizing to the physical environment. This paper aims to evaluate techniques to improve sim2real learning and bridge the reality gap using RL. For this research, a 3-DOF Stewart Platform was built virtually and physically. The goal of the platform was to guide and balance the marble towards the center of the Stewart platform. Custom API was created to induce noise, manipulate in-game physics, dynamics, and lighting conditions, and perform domain randomization to improve generalization. Two RL algorithms; Q-Learning and Actor-Critic were implemented to train the agent and to evaluate the performance in bridging the reality gap. This paper outlines the techniques utilized to create noise, domain randomization, perform training, results, and observations. Overall, the obtained results show the effectiveness of domain randomization and inducing noise during the agents' learning process. Additionally, the findings provide valuable insights into implementing sim2real RL algorithms to bridge the reality gap.","PeriodicalId":302103,"journal":{"name":"The International FLAIRS Conference Proceedings","volume":"396 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of Techniques for Sim2Real Reinforcement Learning\",\"authors\":\"Mahesh Ranaweera, Q. Mahmoud\",\"doi\":\"10.32473/flairs.36.133317\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL) has demonstrated promising results in transferring learned policies from simulation to real-world environments. However, inconsistencies and discrepancies between the two environments cause a negative transfer. The phenomenon is commonly known as the “reality gap.” The reality gap prevents learned policies from generalizing to the physical environment. This paper aims to evaluate techniques to improve sim2real learning and bridge the reality gap using RL. For this research, a 3-DOF Stewart Platform was built virtually and physically. The goal of the platform was to guide and balance the marble towards the center of the Stewart platform. Custom API was created to induce noise, manipulate in-game physics, dynamics, and lighting conditions, and perform domain randomization to improve generalization. Two RL algorithms; Q-Learning and Actor-Critic were implemented to train the agent and to evaluate the performance in bridging the reality gap. This paper outlines the techniques utilized to create noise, domain randomization, perform training, results, and observations. Overall, the obtained results show the effectiveness of domain randomization and inducing noise during the agents' learning process. Additionally, the findings provide valuable insights into implementing sim2real RL algorithms to bridge the reality gap.\",\"PeriodicalId\":302103,\"journal\":{\"name\":\"The International FLAIRS Conference Proceedings\",\"volume\":\"396 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The International FLAIRS Conference Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32473/flairs.36.133317\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International FLAIRS Conference Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32473/flairs.36.133317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

强化学习(RL)在将学习到的策略从模拟转移到现实环境方面已经证明了有希望的结果。然而,两种环境之间的不一致和差异会导致负迁移。这种现象通常被称为“现实差距”。现实差距阻止了学到的策略推广到物理环境。本文旨在评估使用强化学习来改善模拟真实学习和弥合现实差距的技术。为此,对三自由度Stewart平台进行了虚拟和实体搭建。平台的目的是引导和平衡大理石到斯图尔特平台的中心。创建自定义API来诱导噪声,操纵游戏中的物理,动态和照明条件,并执行域随机化以提高泛化。两种强化学习算法;采用Q-Learning和Actor-Critic来训练智能体并评估其在弥合现实差距方面的表现。本文概述了用于创建噪声、域随机化、执行训练、结果和观察的技术。总的来说,得到的结果表明,在智能体学习过程中,领域随机化和诱导噪声是有效的。此外,研究结果为实现sim2real强化学习算法以弥合现实差距提供了有价值的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluation of Techniques for Sim2Real Reinforcement Learning
Reinforcement learning (RL) has demonstrated promising results in transferring learned policies from simulation to real-world environments. However, inconsistencies and discrepancies between the two environments cause a negative transfer. The phenomenon is commonly known as the “reality gap.” The reality gap prevents learned policies from generalizing to the physical environment. This paper aims to evaluate techniques to improve sim2real learning and bridge the reality gap using RL. For this research, a 3-DOF Stewart Platform was built virtually and physically. The goal of the platform was to guide and balance the marble towards the center of the Stewart platform. Custom API was created to induce noise, manipulate in-game physics, dynamics, and lighting conditions, and perform domain randomization to improve generalization. Two RL algorithms; Q-Learning and Actor-Critic were implemented to train the agent and to evaluate the performance in bridging the reality gap. This paper outlines the techniques utilized to create noise, domain randomization, perform training, results, and observations. Overall, the obtained results show the effectiveness of domain randomization and inducing noise during the agents' learning process. Additionally, the findings provide valuable insights into implementing sim2real RL algorithms to bridge the reality gap.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信