针对 Q 型学习者的策略：控制论方法

IF 2.4 Q2 AUTOMATION & CONTROL SYSTEMS

IEEE Control Systems Letters Pub Date : 2024-06-18 DOI:10.1109/LCSYS.2024.3416240

Yuksel Arslantas;Ege Yuceel;Muhammed O. Sayin

{"title":"针对 Q 型学习者的策略：控制论方法","authors":"Yuksel Arslantas;Ege Yuceel;Muhammed O. Sayin","doi":"10.1109/LCSYS.2024.3416240","DOIUrl":null,"url":null,"abstract":"In this letter, we explore the susceptibility of the independent Q-learning algorithms (a classical and widely used multi-agent reinforcement learning method) to strategic manipulation of sophisticated opponents in normal-form games played repeatedly. We quantify how much strategically sophisticated agents can exploit naive Q-learners if they know the opponents’ Q-learning algorithm. To this end, we formulate the strategic actors’ interactions as a stochastic game (whose state encompasses Q-function estimates of the Q-learners) as if the Q-learning algorithms are the underlying dynamical system. We also present a quantization-based approximation scheme to tackle the continuum state space and analyze its performance for two competing strategic actors and a single strategic actor both analytically and numerically.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"8 ","pages":"1733-1738"},"PeriodicalIF":2.4000,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Strategizing Against Q-Learners: A Control-Theoretical Approach\",\"authors\":\"Yuksel Arslantas;Ege Yuceel;Muhammed O. Sayin\",\"doi\":\"10.1109/LCSYS.2024.3416240\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this letter, we explore the susceptibility of the independent Q-learning algorithms (a classical and widely used multi-agent reinforcement learning method) to strategic manipulation of sophisticated opponents in normal-form games played repeatedly. We quantify how much strategically sophisticated agents can exploit naive Q-learners if they know the opponents’ Q-learning algorithm. To this end, we formulate the strategic actors’ interactions as a stochastic game (whose state encompasses Q-function estimates of the Q-learners) as if the Q-learning algorithms are the underlying dynamical system. We also present a quantization-based approximation scheme to tackle the continuum state space and analyze its performance for two competing strategic actors and a single strategic actor both analytically and numerically.\",\"PeriodicalId\":37235,\"journal\":{\"name\":\"IEEE Control Systems Letters\",\"volume\":\"8 \",\"pages\":\"1733-1738\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Control Systems Letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10561617/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Control Systems Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10561617/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

在这封信中，我们探讨了独立 Q-learning 算法（一种经典的、广泛使用的多代理强化学习方法）在重复进行的正则表达式博弈中易受复杂对手的战略操纵的问题。如果复杂的代理知道对手的 Q-learning 算法，我们将量化复杂代理在多大程度上可以利用幼稚的 Q-learning 算法。为此，我们将战略行动者的互动表述为一个随机博弈（其状态包含 Q 学习者的 Q 函数估计值），就好像 Q 学习算法是底层动态系统一样。我们还提出了一种基于量化的近似方案来处理连续状态空间，并对两个相互竞争的战略行动者和一个单一战略行动者的性能进行了分析和数值计算。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Strategizing Against Q-Learners: A Control-Theoretical Approach

In this letter, we explore the susceptibility of the independent Q-learning algorithms (a classical and widely used multi-agent reinforcement learning method) to strategic manipulation of sophisticated opponents in normal-form games played repeatedly. We quantify how much strategically sophisticated agents can exploit naive Q-learners if they know the opponents’ Q-learning algorithm. To this end, we formulate the strategic actors’ interactions as a stochastic game (whose state encompasses Q-function estimates of the Q-learners) as if the Q-learning algorithms are the underlying dynamical system. We also present a quantization-based approximation scheme to tackle the continuum state space and analyze its performance for two competing strategic actors and a single strategic actor both analytically and numerically.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Control Systems Letters Mathematics-Control and Optimization

CiteScore

4.40

自引率

13.30%

发文量

471