Reinforcement Learning in Deep Structured Teams: Initial Results with Finite and Infinite Valued Features

Jalal Arabneydi, Masoud Roudneshin, A. Aghdam
{"title":"Reinforcement Learning in Deep Structured Teams: Initial Results with Finite and Infinite Valued Features","authors":"Jalal Arabneydi, Masoud Roudneshin, A. Aghdam","doi":"10.1109/CCTA41146.2020.9206397","DOIUrl":null,"url":null,"abstract":"In this paper, we consider Markov chain and linear quadratic models for deep structured teams with discounted and time-average cost functions under two non-classical information structures, namely, deep state sharing and no sharing. In deep structured teams, agents are coupled in dynamics and cost functions through deep state, where deep state refers to a set of orthogonal linear regressions of the states. In this article, we consider a homogeneous linear regression for Markov chain models (i.e., empirical distribution of states) and a few orthonormal linear regressions for linear quadratic models (i.e., weighted average of states). Some planning algorithms are developed for the case when the model is known, and some reinforcement learning algorithms are proposed for the case when the model is not known completely. The convergence of two model-free (reinforcement learning) algorithms, one for Markov chain models and one for linear quadratic models, is established. The results are then applied to a smart grid.","PeriodicalId":241335,"journal":{"name":"2020 IEEE Conference on Control Technology and Applications (CCTA)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Conference on Control Technology and Applications (CCTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCTA41146.2020.9206397","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

In this paper, we consider Markov chain and linear quadratic models for deep structured teams with discounted and time-average cost functions under two non-classical information structures, namely, deep state sharing and no sharing. In deep structured teams, agents are coupled in dynamics and cost functions through deep state, where deep state refers to a set of orthogonal linear regressions of the states. In this article, we consider a homogeneous linear regression for Markov chain models (i.e., empirical distribution of states) and a few orthonormal linear regressions for linear quadratic models (i.e., weighted average of states). Some planning algorithms are developed for the case when the model is known, and some reinforcement learning algorithms are proposed for the case when the model is not known completely. The convergence of two model-free (reinforcement learning) algorithms, one for Markov chain models and one for linear quadratic models, is established. The results are then applied to a smart grid.
深度结构化团队中的强化学习:有限和无限值特征的初步结果
本文研究了深度结构化团队在深度状态共享和无共享两种非经典信息结构下的马尔可夫链和线性二次模型。在深度结构化的团队中,代理通过深度状态在动态和成本函数中耦合,其中深度状态指的是状态的一组正交线性回归。在本文中,我们考虑了马尔可夫链模型的齐次线性回归(即状态的经验分布)和线性二次模型的一些标准正交线性回归(即状态的加权平均)。针对模型已知的情况,开发了一些规划算法,并针对模型不完全已知的情况,提出了一些强化学习算法。建立了两种无模型(强化学习)算法的收敛性,一种用于马尔可夫链模型,另一种用于线性二次模型。然后将结果应用于智能电网。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信