分散规划初始策略的强化学习

IF 2.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Landon Kraemer, Bikramjit Banerjee
{"title":"分散规划初始策略的强化学习","authors":"Landon Kraemer, Bikramjit Banerjee","doi":"10.1145/2668130","DOIUrl":null,"url":null,"abstract":"Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, as well as local information. Prevalent solution techniques are centralized and model based—limitations that we address by distributed reinforcement learning (RL). We particularly favor alternate learning, where agents alternately learn best responses to each other, which appears to outperform concurrent RL. However, alternate learning requires an initial policy. We propose two principled approaches to generating informed initial policies: a naive approach that lays the foundation for a more sophisticated approach. We empirically demonstrate that the refined approach produces near-optimal solutions in many challenging benchmark settings, staking a claim to being an efficient (and realistic) approximate solver in its own right. Furthermore, alternate best response learning seeded with such policies quickly learns high-quality policies as well.","PeriodicalId":50919,"journal":{"name":"ACM Transactions on Autonomous and Adaptive Systems","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2015-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Reinforcement Learning of Informed Initial Policies for Decentralized Planning\",\"authors\":\"Landon Kraemer, Bikramjit Banerjee\",\"doi\":\"10.1145/2668130\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, as well as local information. Prevalent solution techniques are centralized and model based—limitations that we address by distributed reinforcement learning (RL). We particularly favor alternate learning, where agents alternately learn best responses to each other, which appears to outperform concurrent RL. However, alternate learning requires an initial policy. We propose two principled approaches to generating informed initial policies: a naive approach that lays the foundation for a more sophisticated approach. We empirically demonstrate that the refined approach produces near-optimal solutions in many challenging benchmark settings, staking a claim to being an efficient (and realistic) approximate solver in its own right. Furthermore, alternate best response learning seeded with such policies quickly learns high-quality policies as well.\",\"PeriodicalId\":50919,\"journal\":{\"name\":\"ACM Transactions on Autonomous and Adaptive Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2015-01-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Autonomous and Adaptive Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/2668130\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Autonomous and Adaptive Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/2668130","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 7

摘要

分散式部分可观察马尔可夫决策过程(deco - pomdp)为协作多智能体系统中的规划提供了一种形式化模型,其中智能体使用噪声传感器和执行器以及本地信息进行操作。流行的解决方案技术是集中式的和基于模型的限制,我们通过分布式强化学习(RL)来解决这些限制。我们特别喜欢交替学习,其中代理交替学习彼此的最佳反应,这似乎优于并发强化学习。然而,交替学习需要一个初始策略。我们提出了两种原则性的方法来生成知情的初始政策:一种幼稚的方法为更复杂的方法奠定基础。我们的经验证明,改进的方法在许多具有挑战性的基准设置中产生接近最优的解决方案,声称自己是一个有效的(和现实的)近似求解器。此外,以这些策略为种子的备选最佳响应学习也可以快速学习高质量的策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Reinforcement Learning of Informed Initial Policies for Decentralized Planning
Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, as well as local information. Prevalent solution techniques are centralized and model based—limitations that we address by distributed reinforcement learning (RL). We particularly favor alternate learning, where agents alternately learn best responses to each other, which appears to outperform concurrent RL. However, alternate learning requires an initial policy. We propose two principled approaches to generating informed initial policies: a naive approach that lays the foundation for a more sophisticated approach. We empirically demonstrate that the refined approach produces near-optimal solutions in many challenging benchmark settings, staking a claim to being an efficient (and realistic) approximate solver in its own right. Furthermore, alternate best response learning seeded with such policies quickly learns high-quality policies as well.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ACM Transactions on Autonomous and Adaptive Systems
ACM Transactions on Autonomous and Adaptive Systems 工程技术-计算机:理论方法
CiteScore
4.80
自引率
7.40%
发文量
9
审稿时长
>12 weeks
期刊介绍: TAAS addresses research on autonomous and adaptive systems being undertaken by an increasingly interdisciplinary research community -- and provides a common platform under which this work can be published and disseminated. TAAS encourages contributions aimed at supporting the understanding, development, and control of such systems and of their behaviors. TAAS addresses research on autonomous and adaptive systems being undertaken by an increasingly interdisciplinary research community - and provides a common platform under which this work can be published and disseminated. TAAS encourages contributions aimed at supporting the understanding, development, and control of such systems and of their behaviors. Contributions are expected to be based on sound and innovative theoretical models, algorithms, engineering and programming techniques, infrastructures and systems, or technological and application experiences.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信