SHIRE:在强化学习中利用人类直觉提高采样效率

Amogh Joshi, Adarsh Kumar Kosta, Kaushik Roy
{"title":"SHIRE:在强化学习中利用人类直觉提高采样效率","authors":"Amogh Joshi, Adarsh Kumar Kosta, Kaushik Roy","doi":"arxiv-2409.09990","DOIUrl":null,"url":null,"abstract":"The ability of neural networks to perform robotic perception and control\ntasks such as depth and optical flow estimation, simultaneous localization and\nmapping (SLAM), and automatic control has led to their widespread adoption in\nrecent years. Deep Reinforcement Learning has been used extensively in these\nsettings, as it does not have the unsustainable training costs associated with\nsupervised learning. However, DeepRL suffers from poor sample efficiency, i.e.,\nit requires a large number of environmental interactions to converge to an\nacceptable solution. Modern RL algorithms such as Deep Q Learning and Soft\nActor-Critic attempt to remedy this shortcoming but can not provide the\nexplainability required in applications such as autonomous robotics. Humans\nintuitively understand the long-time-horizon sequential tasks common in\nrobotics. Properly using such intuition can make RL policies more explainable\nwhile enhancing their sample efficiency. In this work, we propose SHIRE, a\nnovel framework for encoding human intuition using Probabilistic Graphical\nModels (PGMs) and using it in the Deep RL training pipeline to enhance sample\nefficiency. Our framework achieves 25-78% sample efficiency gains across the\nenvironments we evaluate at negligible overhead cost. Additionally, by teaching\nRL agents the encoded elementary behavior, SHIRE enhances policy\nexplainability. A real-world demonstration further highlights the efficacy of\npolicies trained using our framework.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"14 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning\",\"authors\":\"Amogh Joshi, Adarsh Kumar Kosta, Kaushik Roy\",\"doi\":\"arxiv-2409.09990\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ability of neural networks to perform robotic perception and control\\ntasks such as depth and optical flow estimation, simultaneous localization and\\nmapping (SLAM), and automatic control has led to their widespread adoption in\\nrecent years. Deep Reinforcement Learning has been used extensively in these\\nsettings, as it does not have the unsustainable training costs associated with\\nsupervised learning. However, DeepRL suffers from poor sample efficiency, i.e.,\\nit requires a large number of environmental interactions to converge to an\\nacceptable solution. Modern RL algorithms such as Deep Q Learning and Soft\\nActor-Critic attempt to remedy this shortcoming but can not provide the\\nexplainability required in applications such as autonomous robotics. Humans\\nintuitively understand the long-time-horizon sequential tasks common in\\nrobotics. Properly using such intuition can make RL policies more explainable\\nwhile enhancing their sample efficiency. In this work, we propose SHIRE, a\\nnovel framework for encoding human intuition using Probabilistic Graphical\\nModels (PGMs) and using it in the Deep RL training pipeline to enhance sample\\nefficiency. Our framework achieves 25-78% sample efficiency gains across the\\nenvironments we evaluate at negligible overhead cost. Additionally, by teaching\\nRL agents the encoded elementary behavior, SHIRE enhances policy\\nexplainability. A real-world demonstration further highlights the efficacy of\\npolicies trained using our framework.\",\"PeriodicalId\":501347,\"journal\":{\"name\":\"arXiv - CS - Neural and Evolutionary Computing\",\"volume\":\"14 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Neural and Evolutionary Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09990\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Neural and Evolutionary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09990","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

神经网络能够执行深度和光流估计、同步定位和映射(SLAM)以及自动控制等机器人感知和控制任务,因此近年来被广泛采用。深度强化学习(Deep Reinforcement Learning)在这些环境中得到了广泛应用,因为它不存在与监督学习相关的不可持续的训练成本。然而,深度强化学习的采样效率较低,也就是说,它需要大量的环境交互才能收敛到可接受的解决方案。Deep Q Learning 和 SoftActor-Critic 等现代 RL 算法试图弥补这一缺陷,但无法提供自主机器人等应用所需的可解释性。人类凭直觉就能理解机器人技术中常见的长时间跨度顺序任务。适当利用这种直觉可以使 RL 策略更具可解释性,同时提高其采样效率。在这项工作中,我们提出了一个新的框架--SHIRE,用于使用概率图形模型(PGM)对人类直觉进行编码,并将其用于深度 RL 训练管道以提高采样效率。在我们评估的环境中,我们的框架以可忽略不计的开销成本实现了 25-78% 的样本效率提升。此外,通过向 RL 代理教授编码的基本行为,SHIRE 增强了政策的可解释性。现实世界的演示进一步凸显了使用我们的框架训练出的政策的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning
The ability of neural networks to perform robotic perception and control tasks such as depth and optical flow estimation, simultaneous localization and mapping (SLAM), and automatic control has led to their widespread adoption in recent years. Deep Reinforcement Learning has been used extensively in these settings, as it does not have the unsustainable training costs associated with supervised learning. However, DeepRL suffers from poor sample efficiency, i.e., it requires a large number of environmental interactions to converge to an acceptable solution. Modern RL algorithms such as Deep Q Learning and Soft Actor-Critic attempt to remedy this shortcoming but can not provide the explainability required in applications such as autonomous robotics. Humans intuitively understand the long-time-horizon sequential tasks common in robotics. Properly using such intuition can make RL policies more explainable while enhancing their sample efficiency. In this work, we propose SHIRE, a novel framework for encoding human intuition using Probabilistic Graphical Models (PGMs) and using it in the Deep RL training pipeline to enhance sample efficiency. Our framework achieves 25-78% sample efficiency gains across the environments we evaluate at negligible overhead cost. Additionally, by teaching RL agents the encoded elementary behavior, SHIRE enhances policy explainability. A real-world demonstration further highlights the efficacy of policies trained using our framework.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信