基于自组织模式的神经动态规划

J. Si, Y.-T. Wang
{"title":"基于自组织模式的神经动态规划","authors":"J. Si, Y.-T. Wang","doi":"10.1109/ISIC.1999.796641","DOIUrl":null,"url":null,"abstract":"This paper introduces a real-time learning control mechanism, as a robust and efficient scheme of neuro-dynamic programming. The objective of the learning controller is to optimize a certain performance measure by learning to create appropriate control actions through interacting with the environment. The controller is set out to learn to perform better over time starting with no prior knowledge about the system. The system under consideration does not render a complete system model describing its behaviors. Instead, real-time sampled measurements are available to the designer. The state measurements are first analyzed by similarity and organized by proximity. Control actions are then generated in relevance to the state patterns. A critic network serves the purpose of 'monitoring' the performance of the controller to achieve a given optimality. We provide detailed implementation, and performance evaluations of this learning controller in a cart-pole balancing problem.","PeriodicalId":300130,"journal":{"name":"Proceedings of the 1999 IEEE International Symposium on Intelligent Control Intelligent Systems and Semiotics (Cat. No.99CH37014)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Neuro-dynamic programming based on self-organized patterns\",\"authors\":\"J. Si, Y.-T. Wang\",\"doi\":\"10.1109/ISIC.1999.796641\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper introduces a real-time learning control mechanism, as a robust and efficient scheme of neuro-dynamic programming. The objective of the learning controller is to optimize a certain performance measure by learning to create appropriate control actions through interacting with the environment. The controller is set out to learn to perform better over time starting with no prior knowledge about the system. The system under consideration does not render a complete system model describing its behaviors. Instead, real-time sampled measurements are available to the designer. The state measurements are first analyzed by similarity and organized by proximity. Control actions are then generated in relevance to the state patterns. A critic network serves the purpose of 'monitoring' the performance of the controller to achieve a given optimality. We provide detailed implementation, and performance evaluations of this learning controller in a cart-pole balancing problem.\",\"PeriodicalId\":300130,\"journal\":{\"name\":\"Proceedings of the 1999 IEEE International Symposium on Intelligent Control Intelligent Systems and Semiotics (Cat. No.99CH37014)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 1999 IEEE International Symposium on Intelligent Control Intelligent Systems and Semiotics (Cat. No.99CH37014)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISIC.1999.796641\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1999 IEEE International Symposium on Intelligent Control Intelligent Systems and Semiotics (Cat. No.99CH37014)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISIC.1999.796641","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

本文介绍了一种实时学习控制机制,作为一种鲁棒、高效的神经动态规划方案。学习控制器的目标是通过与环境的相互作用来学习创建适当的控制动作,从而优化某一性能度量。控制器的设定是在没有系统先验知识的情况下,随着时间的推移学习更好的性能。所考虑的系统没有给出描述其行为的完整系统模型。相反,实时采样测量可供设计人员使用。首先通过相似性分析状态测量,然后根据接近度组织状态测量。然后生成与状态模式相关的控制动作。批评家网络的作用是“监视”控制器的性能以达到给定的最优性。我们提供了该学习控制器在推车杆平衡问题中的详细实现和性能评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Neuro-dynamic programming based on self-organized patterns
This paper introduces a real-time learning control mechanism, as a robust and efficient scheme of neuro-dynamic programming. The objective of the learning controller is to optimize a certain performance measure by learning to create appropriate control actions through interacting with the environment. The controller is set out to learn to perform better over time starting with no prior knowledge about the system. The system under consideration does not render a complete system model describing its behaviors. Instead, real-time sampled measurements are available to the designer. The state measurements are first analyzed by similarity and organized by proximity. Control actions are then generated in relevance to the state patterns. A critic network serves the purpose of 'monitoring' the performance of the controller to achieve a given optimality. We provide detailed implementation, and performance evaluations of this learning controller in a cart-pole balancing problem.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信