Decentralized fused-learner architectures for Bayesian reinforcement learning

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Pub Date : 2024-02-13 DOI:10.1016/j.artint.2024.104094

Augustin A. Saucan , Subhro Das , Moe Z. Win

{"title":"Decentralized fused-learner architectures for Bayesian reinforcement learning","authors":"Augustin A. Saucan , Subhro Das , Moe Z. Win","doi":"10.1016/j.artint.2024.104094","DOIUrl":null,"url":null,"abstract":"<div><p>Decentralized training is a robust solution for learning over an extensive network of distributed agents. Many existing solutions involve the averaging of locally inferred parameters which constrain the architecture to independent agents with identical learning algorithms. Here, we propose decentralized fused-learner architectures for Bayesian reinforcement learning, named fused Bayesian-learner architectures (FBLAs), that are capable of learning an optimal policy by fusing potentially heterogeneous Bayesian policy gradient learners, i.e., agents that employ different learning architectures to estimate the gradient of a control policy. The novelty of FBLAs relies on fusing the full posterior distributions of the local policy gradients. The inclusion of higher-order information, i.e., probabilistic uncertainty, is employed to robustly fuse the locally-trained parameters. FBLAs find the barycenter of all local posterior densities by minimizing the total Kullback–Leibler divergence from the barycenter distribution to the local posterior densities. The proposed FBLAs are demonstrated on a sensor-selection problem for Bernoulli tracking, where multiple sensors observe a dynamic target and only a subset of sensors is allowed to be active at any time.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"331 ","pages":"Article 104094"},"PeriodicalIF":5.1000,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370224000304","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Decentralized training is a robust solution for learning over an extensive network of distributed agents. Many existing solutions involve the averaging of locally inferred parameters which constrain the architecture to independent agents with identical learning algorithms. Here, we propose decentralized fused-learner architectures for Bayesian reinforcement learning, named fused Bayesian-learner architectures (FBLAs), that are capable of learning an optimal policy by fusing potentially heterogeneous Bayesian policy gradient learners, i.e., agents that employ different learning architectures to estimate the gradient of a control policy. The novelty of FBLAs relies on fusing the full posterior distributions of the local policy gradients. The inclusion of higher-order information, i.e., probabilistic uncertainty, is employed to robustly fuse the locally-trained parameters. FBLAs find the barycenter of all local posterior densities by minimizing the total Kullback–Leibler divergence from the barycenter distribution to the local posterior densities. The proposed FBLAs are demonstrated on a sensor-selection problem for Bernoulli tracking, where multiple sensors observe a dynamic target and only a subset of sensors is allowed to be active at any time.

查看原文本刊更多论文

贝叶斯强化学习的分散融合学习器架构

分散式训练是在广泛的分布式代理网络中进行学习的稳健解决方案。现有的许多解决方案都涉及局部推断参数的平均化，这就将架构限制为具有相同学习算法的独立代理。在这里，我们提出了用于贝叶斯强化学习的分散式融合学习器架构，并将其命名为融合贝叶斯学习器架构（FBLAs），它能够通过融合潜在的异构贝叶斯策略梯度学习器（即采用不同学习架构来估计控制策略梯度的代理）来学习最优策略。贝叶斯策略梯度学习器的新颖之处在于融合了局部策略梯度的完整后验分布。将高阶信息（即概率不确定性）纳入其中，可稳健地融合局部训练参数。FBLA 通过最小化从原点分布到局部后验密度的总库尔贝-莱布勒发散，找到所有局部后验密度的原点。我们在伯努利跟踪的传感器选择问题上演示了所提出的 FBLA，在该问题中，多个传感器观察一个动态目标，而在任何时候都只允许一个传感器子集处于活动状态。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Intelligence 工程技术-计算机：人工智能

CiteScore

11.20

自引率

1.40%

发文量

118

审稿时长

8 months

期刊介绍： The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.