Trade-Off Between Robustness and Rewards Adversarial Training for Deep Reinforcement Learning Under Large Perturbations

IF 4.6 2区 计算机科学 Q2 ROBOTICS
Jeffrey Huang;Ho Jin Choi;Nadia Figueroa
{"title":"Trade-Off Between Robustness and Rewards Adversarial Training for Deep Reinforcement Learning Under Large Perturbations","authors":"Jeffrey Huang;Ho Jin Choi;Nadia Figueroa","doi":"10.1109/LRA.2023.3324590","DOIUrl":null,"url":null,"abstract":"Deep Reinforcement Learning (DRL) has become a popular approach for training robots due to its generalization promise, complex task capacity and minimal human intervention. Nevertheless, DRL-trained controllers are vulnerable to even the smallest of perturbations on its inputs which can lead to catastrophic failures in real-world human-centric environments with large and unexpected perturbations. In this work, we study the vulnerability of state-of-the-art DRL subject to large perturbations and propose a novel adversarial training framework for robust control. Our approach generates aggressive attacks on the state space and the expected state-action values to emulate real-world perturbations such as sensor noise, perception failures, physical perturbations, observations mismatch, etc. To achieve this, we reformulate the adversarial risk to yield a trade-off between rewards and robustness (TBRR). We show that TBRR-aided DRL training is robust to aggressive attacks and outperforms baselines on standard DRL benchmarks (Cartpole, Pendulum), Meta-World tasks (door manipulation) and a vision-based grasping task with a 7DoF manipulator. Finally, we show that the vision-based grasping task trained in simulation via TBRR transfers sim2real with 70% success rate subject to sensor impairment and physical perturbations without any retraining.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"8 12","pages":"8018-8025"},"PeriodicalIF":4.6000,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10284990/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Deep Reinforcement Learning (DRL) has become a popular approach for training robots due to its generalization promise, complex task capacity and minimal human intervention. Nevertheless, DRL-trained controllers are vulnerable to even the smallest of perturbations on its inputs which can lead to catastrophic failures in real-world human-centric environments with large and unexpected perturbations. In this work, we study the vulnerability of state-of-the-art DRL subject to large perturbations and propose a novel adversarial training framework for robust control. Our approach generates aggressive attacks on the state space and the expected state-action values to emulate real-world perturbations such as sensor noise, perception failures, physical perturbations, observations mismatch, etc. To achieve this, we reformulate the adversarial risk to yield a trade-off between rewards and robustness (TBRR). We show that TBRR-aided DRL training is robust to aggressive attacks and outperforms baselines on standard DRL benchmarks (Cartpole, Pendulum), Meta-World tasks (door manipulation) and a vision-based grasping task with a 7DoF manipulator. Finally, we show that the vision-based grasping task trained in simulation via TBRR transfers sim2real with 70% success rate subject to sensor impairment and physical perturbations without any retraining.
在大扰动下深度强化学习的鲁棒性和回报性对抗性训练之间的权衡
深度强化学习(DRL)由于其泛化能力、复杂的任务能力和最少的人工干预,已成为训练机器人的一种流行方法。尽管如此,DRL训练的控制器在其输入上即使是最小的扰动也很容易受到影响,这可能会在现实世界中以人为中心的环境中导致灾难性的故障,这些环境具有巨大而意外的扰动。在这项工作中,我们研究了最先进的DRL在大扰动下的脆弱性,并提出了一种新的鲁棒控制对抗性训练框架。我们的方法对状态空间和预期的状态动作值产生攻击性攻击,以模拟真实世界的扰动,如传感器噪声、感知故障、物理扰动、观测失配等。为了实现这一点,我们重新制定了对抗性风险,以在回报和稳健性(TBRR)之间进行权衡。我们表明,TBRR辅助DRL训练对攻击性攻击具有鲁棒性,并且在标准DRL基准(Cartpole、Pendulum)、元世界任务(门操纵)和使用7DoF机械手的基于视觉的抓握任务上优于基线。最后,我们表明,在没有任何再训练的情况下,通过TBRR在模拟中训练的基于视觉的抓取任务在传感器损伤和物理扰动的情况下以70%的成功率转移sim2real。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Robotics and Automation Letters
IEEE Robotics and Automation Letters Computer Science-Computer Science Applications
CiteScore
9.60
自引率
15.40%
发文量
1428
期刊介绍: The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信