基于深度强化学习的导航技能对人类偏好的快速适应

Jinyoung Choi, C. Dance, Jung-Eun Kim, Kyungsik Park, Jaehun Han, Joonho Seo, Minsu Kim
{"title":"基于深度强化学习的导航技能对人类偏好的快速适应","authors":"Jinyoung Choi, C. Dance, Jung-Eun Kim, Kyungsik Park, Jaehun Han, Joonho Seo, Minsu Kim","doi":"10.1109/ICRA40945.2020.9197159","DOIUrl":null,"url":null,"abstract":"Deep reinforcement learning (RL) is being actively studied for robot navigation due to its promise of superior performance and robustness. However, most existing deep RL navigation agents are trained using fixed parameters, such as maximum velocities and weightings of reward components. Since the optimal choice of parameters depends on the use-case, it can be difficult to deploy such existing methods in a variety of real-world service scenarios. In this paper, we propose a novel deep RL navigation method that can adapt its policy to a wide range of parameters and reward functions without expensive retraining. Additionally, we explore a Bayesian deep learning method to optimize these parameters that requires only a small amount of preference data. We empirically show that our method can learn diverse navigation skills and quickly adapt its policy to a given performance metric or to human preference. We also demonstrate our method in real-world scenarios.","PeriodicalId":6859,"journal":{"name":"2020 IEEE International Conference on Robotics and Automation (ICRA)","volume":"30 1","pages":"3363-3370"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Fast Adaptation of Deep Reinforcement Learning-Based Navigation Skills to Human Preference\",\"authors\":\"Jinyoung Choi, C. Dance, Jung-Eun Kim, Kyungsik Park, Jaehun Han, Joonho Seo, Minsu Kim\",\"doi\":\"10.1109/ICRA40945.2020.9197159\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep reinforcement learning (RL) is being actively studied for robot navigation due to its promise of superior performance and robustness. However, most existing deep RL navigation agents are trained using fixed parameters, such as maximum velocities and weightings of reward components. Since the optimal choice of parameters depends on the use-case, it can be difficult to deploy such existing methods in a variety of real-world service scenarios. In this paper, we propose a novel deep RL navigation method that can adapt its policy to a wide range of parameters and reward functions without expensive retraining. Additionally, we explore a Bayesian deep learning method to optimize these parameters that requires only a small amount of preference data. We empirically show that our method can learn diverse navigation skills and quickly adapt its policy to a given performance metric or to human preference. We also demonstrate our method in real-world scenarios.\",\"PeriodicalId\":6859,\"journal\":{\"name\":\"2020 IEEE International Conference on Robotics and Automation (ICRA)\",\"volume\":\"30 1\",\"pages\":\"3363-3370\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Robotics and Automation (ICRA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRA40945.2020.9197159\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Robotics and Automation (ICRA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRA40945.2020.9197159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

摘要

深度强化学习(RL)由于其优越的性能和鲁棒性而被积极研究用于机器人导航。然而,大多数现有的深度强化学习导航代理都是使用固定参数进行训练的,比如最大速度和奖励成分的权重。由于参数的最佳选择取决于用例,因此很难在各种实际服务场景中部署这种现有方法。在本文中,我们提出了一种新的深度强化学习导航方法,该方法可以使其策略适应广泛的参数和奖励函数,而无需昂贵的再训练。此外,我们还探索了一种贝叶斯深度学习方法来优化这些只需要少量偏好数据的参数。我们的经验表明,我们的方法可以学习不同的导航技能,并快速调整其策略以适应给定的性能指标或人类偏好。我们还在实际场景中演示了我们的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Fast Adaptation of Deep Reinforcement Learning-Based Navigation Skills to Human Preference
Deep reinforcement learning (RL) is being actively studied for robot navigation due to its promise of superior performance and robustness. However, most existing deep RL navigation agents are trained using fixed parameters, such as maximum velocities and weightings of reward components. Since the optimal choice of parameters depends on the use-case, it can be difficult to deploy such existing methods in a variety of real-world service scenarios. In this paper, we propose a novel deep RL navigation method that can adapt its policy to a wide range of parameters and reward functions without expensive retraining. Additionally, we explore a Bayesian deep learning method to optimize these parameters that requires only a small amount of preference data. We empirically show that our method can learn diverse navigation skills and quickly adapt its policy to a given performance metric or to human preference. We also demonstrate our method in real-world scenarios.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信