Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning Pub Date : 2023-06-09 DOI:10.48550/arXiv.2306.05873

Ezgi Korkmaz, Jonah Brown-Cohen

{"title":"Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions","authors":"Ezgi Korkmaz, Jonah Brown-Cohen","doi":"10.48550/arXiv.2306.05873","DOIUrl":null,"url":null,"abstract":"Learning in MDPs with highly complex state representations is currently possible due to multiple advancements in reinforcement learning algorithm design. However, this incline in complexity, and furthermore the increase in the dimensions of the observation came at the cost of volatility that can be taken advantage of via adversarial attacks (i.e. moving along worst-case directions in the observation space). To solve this policy instability problem we propose a novel method to detect the presence of these non-robust directions via local quadratic approximation of the deep neural policy loss. Our method provides a theoretical basis for the fundamental cut-off between safe observations and adversarial observations. Furthermore, our technique is computationally efficient, and does not depend on the methods used to produce the worst-case directions. We conduct extensive experiments in the Arcade Learning Environment with several different adversarial attack techniques. Most significantly, we demonstrate the effectiveness of our approach even in the setting where non-robust directions are explicitly optimized to circumvent our proposed method.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"51 17","pages":"17534-17543"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2306.05873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Learning in MDPs with highly complex state representations is currently possible due to multiple advancements in reinforcement learning algorithm design. However, this incline in complexity, and furthermore the increase in the dimensions of the observation came at the cost of volatility that can be taken advantage of via adversarial attacks (i.e. moving along worst-case directions in the observation space). To solve this policy instability problem we propose a novel method to detect the presence of these non-robust directions via local quadratic approximation of the deep neural policy loss. Our method provides a theoretical basis for the fundamental cut-off between safe observations and adversarial observations. Furthermore, our technique is computationally efficient, and does not depend on the methods used to produce the worst-case directions. We conduct extensive experiments in the Arcade Learning Environment with several different adversarial attack techniques. Most significantly, we demonstrate the effectiveness of our approach even in the setting where non-robust directions are explicitly optimized to circumvent our proposed method.

查看原文本刊更多论文

在深度强化学习中检测对抗方向以做出稳健决策

由于强化学习算法设计的多项进步，在具有高度复杂状态表示的mdp中学习目前是可能的。然而，这种复杂性的倾斜，以及观察维度的增加是以波动性为代价的，这种波动性可以通过对抗性攻击(即沿着观察空间中最坏的方向移动)来利用。为了解决这种策略不稳定问题，我们提出了一种新的方法，通过深度神经策略损失的局部二次逼近来检测这些非鲁棒方向的存在。我们的方法为安全观测和对抗观测之间的基本界限提供了理论基础。此外，我们的技术计算效率高，并且不依赖于用于产生最坏情况方向的方法。我们在街机学习环境中使用几种不同的对抗性攻击技术进行了广泛的实验。最重要的是，即使在明确优化非鲁棒方向以规避我们提出的方法的情况下，我们也证明了我们方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

自引率

0.00%

发文量