线性动力系统的多任务模仿学习

Thomas Zhang, Katie Kang, Bruce Lee, C. Tomlin, S. Levine, Stephen Tu, N. Matni
{"title":"线性动力系统的多任务模仿学习","authors":"Thomas Zhang, Katie Kang, Bruce Lee, C. Tomlin, S. Levine, Stephen Tu, N. Matni","doi":"10.48550/arXiv.2212.00186","DOIUrl":null,"url":null,"abstract":"We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared $k$-dimensional representation is learned from $H$ source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $\\tilde{O}\\left( \\frac{k n_x}{HN_{\\mathrm{shared}}} + \\frac{k n_u}{N_{\\mathrm{target}}}\\right)$, where $n_x>k$ is the state dimension, $n_u$ is the input dimension, $N_{\\mathrm{shared}}$ denotes the total amount of data collected for each policy during representation learning, and $N_{\\mathrm{target}}$ is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Multi-Task Imitation Learning for Linear Dynamical Systems\",\"authors\":\"Thomas Zhang, Katie Kang, Bruce Lee, C. Tomlin, S. Levine, Stephen Tu, N. Matni\",\"doi\":\"10.48550/arXiv.2212.00186\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared $k$-dimensional representation is learned from $H$ source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $\\\\tilde{O}\\\\left( \\\\frac{k n_x}{HN_{\\\\mathrm{shared}}} + \\\\frac{k n_u}{N_{\\\\mathrm{target}}}\\\\right)$, where $n_x>k$ is the state dimension, $n_u$ is the input dimension, $N_{\\\\mathrm{shared}}$ denotes the total amount of data collected for each policy during representation learning, and $N_{\\\\mathrm{target}}$ is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.\",\"PeriodicalId\":268449,\"journal\":{\"name\":\"Conference on Learning for Dynamics & Control\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Conference on Learning for Dynamics & Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2212.00186\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Learning for Dynamics & Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2212.00186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

我们研究了线性系统上高效模仿学习的表示学习。特别是,我们考虑将学习分为两个阶段的设置:(a)预训练步骤,其中从$H$源策略中学习共享的$k$维表示,以及(b)目标策略微调步骤,其中学习到的表示用于参数化策略类。我们发现由学习的目标策略生成的轨迹上的模仿间隙由$\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)$限定,其中$n_x>k$为状态维,$n_u$为输入维,$N_{\mathrm{shared}}$表示表示学习过程中每个策略收集的数据总量,$N_{\mathrm{target}}$为目标任务数据量。这个结果形式化了一种直觉,即跨相关任务聚合数据来学习一个表示可以显著提高学习目标任务的样本效率。模拟结果证实了这一界限所暗示的趋势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-Task Imitation Learning for Linear Dynamical Systems
We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared $k$-dimensional representation is learned from $H$ source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)$, where $n_x>k$ is the state dimension, $n_u$ is the input dimension, $N_{\mathrm{shared}}$ denotes the total amount of data collected for each policy during representation learning, and $N_{\mathrm{target}}$ is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信