Reinforcement Learning in Deep Structured Teams: Initial Results with Finite and Infinite Valued Features

2020 IEEE Conference on Control Technology and Applications (CCTA) Pub Date : 2020-08-01 DOI:10.1109/CCTA41146.2020.9206397

Jalal Arabneydi, Masoud Roudneshin, A. Aghdam

{"title":"Reinforcement Learning in Deep Structured Teams: Initial Results with Finite and Infinite Valued Features","authors":"Jalal Arabneydi, Masoud Roudneshin, A. Aghdam","doi":"10.1109/CCTA41146.2020.9206397","DOIUrl":null,"url":null,"abstract":"In this paper, we consider Markov chain and linear quadratic models for deep structured teams with discounted and time-average cost functions under two non-classical information structures, namely, deep state sharing and no sharing. In deep structured teams, agents are coupled in dynamics and cost functions through deep state, where deep state refers to a set of orthogonal linear regressions of the states. In this article, we consider a homogeneous linear regression for Markov chain models (i.e., empirical distribution of states) and a few orthonormal linear regressions for linear quadratic models (i.e., weighted average of states). Some planning algorithms are developed for the case when the model is known, and some reinforcement learning algorithms are proposed for the case when the model is not known completely. The convergence of two model-free (reinforcement learning) algorithms, one for Markov chain models and one for linear quadratic models, is established. The results are then applied to a smart grid.","PeriodicalId":241335,"journal":{"name":"2020 IEEE Conference on Control Technology and Applications (CCTA)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Conference on Control Technology and Applications (CCTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCTA41146.2020.9206397","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

In this paper, we consider Markov chain and linear quadratic models for deep structured teams with discounted and time-average cost functions under two non-classical information structures, namely, deep state sharing and no sharing. In deep structured teams, agents are coupled in dynamics and cost functions through deep state, where deep state refers to a set of orthogonal linear regressions of the states. In this article, we consider a homogeneous linear regression for Markov chain models (i.e., empirical distribution of states) and a few orthonormal linear regressions for linear quadratic models (i.e., weighted average of states). Some planning algorithms are developed for the case when the model is known, and some reinforcement learning algorithms are proposed for the case when the model is not known completely. The convergence of two model-free (reinforcement learning) algorithms, one for Markov chain models and one for linear quadratic models, is established. The results are then applied to a smart grid.

查看原文本刊更多论文

深度结构化团队中的强化学习:有限和无限值特征的初步结果

本文研究了深度结构化团队在深度状态共享和无共享两种非经典信息结构下的马尔可夫链和线性二次模型。在深度结构化的团队中，代理通过深度状态在动态和成本函数中耦合，其中深度状态指的是状态的一组正交线性回归。在本文中，我们考虑了马尔可夫链模型的齐次线性回归(即状态的经验分布)和线性二次模型的一些标准正交线性回归(即状态的加权平均)。针对模型已知的情况，开发了一些规划算法，并针对模型不完全已知的情况，提出了一些强化学习算法。建立了两种无模型(强化学习)算法的收敛性，一种用于马尔可夫链模型，另一种用于线性二次模型。然后将结果应用于智能电网。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE Conference on Control Technology and Applications (CCTA)

自引率

0.00%

发文量