基于字符串优化和曲率的近似动态规划边界

Yajing Liu, E. Chong, A. Pezeshki, W. Moran
{"title":"基于字符串优化和曲率的近似动态规划边界","authors":"Yajing Liu, E. Chong, A. Pezeshki, W. Moran","doi":"10.1109/CDC.2014.7040433","DOIUrl":null,"url":null,"abstract":"In this paper, we will develop a systematic approach to deriving guaranteed bounds for approximate dynamic programming (ADP) schemes in optimal control problems. Our approach is inspired by our recent results on bounding the performance of greedy strategies in optimization of string functions over a finite horizon. The approach is to derive a string-optimization problem, for which the optimal strategy is the optimal control solution and the greedy strategy is the ADP solution. Using this approach, we show that any ADP solution achieves a performance that is at least a factor of β of the performance of the optimal control solution, characterized by Bellman's optimality principle. The factor β depends on the specific ADP scheme, as we will explicitly characterize. To illustrate the applicability of our bounding technique, we present examples of ADP schemes, including the popular rollout method.","PeriodicalId":202708,"journal":{"name":"53rd IEEE Conference on Decision and Control","volume":"201 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Bounds for approximate dynamic programming based on string optimization and curvature\",\"authors\":\"Yajing Liu, E. Chong, A. Pezeshki, W. Moran\",\"doi\":\"10.1109/CDC.2014.7040433\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we will develop a systematic approach to deriving guaranteed bounds for approximate dynamic programming (ADP) schemes in optimal control problems. Our approach is inspired by our recent results on bounding the performance of greedy strategies in optimization of string functions over a finite horizon. The approach is to derive a string-optimization problem, for which the optimal strategy is the optimal control solution and the greedy strategy is the ADP solution. Using this approach, we show that any ADP solution achieves a performance that is at least a factor of β of the performance of the optimal control solution, characterized by Bellman's optimality principle. The factor β depends on the specific ADP scheme, as we will explicitly characterize. To illustrate the applicability of our bounding technique, we present examples of ADP schemes, including the popular rollout method.\",\"PeriodicalId\":202708,\"journal\":{\"name\":\"53rd IEEE Conference on Decision and Control\",\"volume\":\"201 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"53rd IEEE Conference on Decision and Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CDC.2014.7040433\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"53rd IEEE Conference on Decision and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CDC.2014.7040433","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

在本文中,我们将开发一种系统的方法来导出最优控制问题中近似动态规划(ADP)方案的保证界。我们的方法是受到我们最近关于贪婪策略在有限视界上优化字符串函数的性能边界的结果的启发。该方法推导出一个字符串优化问题,其中最优策略为最优控制解,贪心策略为ADP解。使用这种方法,我们证明了任何ADP解决方案的性能至少是最优控制解决方案性能的一个β因子,其特征是Bellman最优性原理。因子β取决于特定的ADP方案,我们将明确描述。为了说明我们的边界技术的适用性,我们给出了ADP方案的例子,包括流行的rollout方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Bounds for approximate dynamic programming based on string optimization and curvature
In this paper, we will develop a systematic approach to deriving guaranteed bounds for approximate dynamic programming (ADP) schemes in optimal control problems. Our approach is inspired by our recent results on bounding the performance of greedy strategies in optimization of string functions over a finite horizon. The approach is to derive a string-optimization problem, for which the optimal strategy is the optimal control solution and the greedy strategy is the ADP solution. Using this approach, we show that any ADP solution achieves a performance that is at least a factor of β of the performance of the optimal control solution, characterized by Bellman's optimality principle. The factor β depends on the specific ADP scheme, as we will explicitly characterize. To illustrate the applicability of our bounding technique, we present examples of ADP schemes, including the popular rollout method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信