{"title":"在大扰动下深度强化学习的鲁棒性和回报性对抗性训练之间的权衡","authors":"Jeffrey Huang;Ho Jin Choi;Nadia Figueroa","doi":"10.1109/LRA.2023.3324590","DOIUrl":null,"url":null,"abstract":"Deep Reinforcement Learning (DRL) has become a popular approach for training robots due to its generalization promise, complex task capacity and minimal human intervention. Nevertheless, DRL-trained controllers are vulnerable to even the smallest of perturbations on its inputs which can lead to catastrophic failures in real-world human-centric environments with large and unexpected perturbations. In this work, we study the vulnerability of state-of-the-art DRL subject to large perturbations and propose a novel adversarial training framework for robust control. Our approach generates aggressive attacks on the state space and the expected state-action values to emulate real-world perturbations such as sensor noise, perception failures, physical perturbations, observations mismatch, etc. To achieve this, we reformulate the adversarial risk to yield a trade-off between rewards and robustness (TBRR). We show that TBRR-aided DRL training is robust to aggressive attacks and outperforms baselines on standard DRL benchmarks (Cartpole, Pendulum), Meta-World tasks (door manipulation) and a vision-based grasping task with a 7DoF manipulator. Finally, we show that the vision-based grasping task trained in simulation via TBRR transfers sim2real with 70% success rate subject to sensor impairment and physical perturbations without any retraining.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"8 12","pages":"8018-8025"},"PeriodicalIF":4.6000,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Trade-Off Between Robustness and Rewards Adversarial Training for Deep Reinforcement Learning Under Large Perturbations\",\"authors\":\"Jeffrey Huang;Ho Jin Choi;Nadia Figueroa\",\"doi\":\"10.1109/LRA.2023.3324590\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep Reinforcement Learning (DRL) has become a popular approach for training robots due to its generalization promise, complex task capacity and minimal human intervention. Nevertheless, DRL-trained controllers are vulnerable to even the smallest of perturbations on its inputs which can lead to catastrophic failures in real-world human-centric environments with large and unexpected perturbations. In this work, we study the vulnerability of state-of-the-art DRL subject to large perturbations and propose a novel adversarial training framework for robust control. Our approach generates aggressive attacks on the state space and the expected state-action values to emulate real-world perturbations such as sensor noise, perception failures, physical perturbations, observations mismatch, etc. To achieve this, we reformulate the adversarial risk to yield a trade-off between rewards and robustness (TBRR). We show that TBRR-aided DRL training is robust to aggressive attacks and outperforms baselines on standard DRL benchmarks (Cartpole, Pendulum), Meta-World tasks (door manipulation) and a vision-based grasping task with a 7DoF manipulator. Finally, we show that the vision-based grasping task trained in simulation via TBRR transfers sim2real with 70% success rate subject to sensor impairment and physical perturbations without any retraining.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"8 12\",\"pages\":\"8018-8025\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2023-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10284990/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10284990/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
Trade-Off Between Robustness and Rewards Adversarial Training for Deep Reinforcement Learning Under Large Perturbations
Deep Reinforcement Learning (DRL) has become a popular approach for training robots due to its generalization promise, complex task capacity and minimal human intervention. Nevertheless, DRL-trained controllers are vulnerable to even the smallest of perturbations on its inputs which can lead to catastrophic failures in real-world human-centric environments with large and unexpected perturbations. In this work, we study the vulnerability of state-of-the-art DRL subject to large perturbations and propose a novel adversarial training framework for robust control. Our approach generates aggressive attacks on the state space and the expected state-action values to emulate real-world perturbations such as sensor noise, perception failures, physical perturbations, observations mismatch, etc. To achieve this, we reformulate the adversarial risk to yield a trade-off between rewards and robustness (TBRR). We show that TBRR-aided DRL training is robust to aggressive attacks and outperforms baselines on standard DRL benchmarks (Cartpole, Pendulum), Meta-World tasks (door manipulation) and a vision-based grasping task with a 7DoF manipulator. Finally, we show that the vision-based grasping task trained in simulation via TBRR transfers sim2real with 70% success rate subject to sensor impairment and physical perturbations without any retraining.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.