{"title":"Neuro-dynamic programming based on self-organized patterns","authors":"J. Si, Y.-T. Wang","doi":"10.1109/ISIC.1999.796641","DOIUrl":null,"url":null,"abstract":"This paper introduces a real-time learning control mechanism, as a robust and efficient scheme of neuro-dynamic programming. The objective of the learning controller is to optimize a certain performance measure by learning to create appropriate control actions through interacting with the environment. The controller is set out to learn to perform better over time starting with no prior knowledge about the system. The system under consideration does not render a complete system model describing its behaviors. Instead, real-time sampled measurements are available to the designer. The state measurements are first analyzed by similarity and organized by proximity. Control actions are then generated in relevance to the state patterns. A critic network serves the purpose of 'monitoring' the performance of the controller to achieve a given optimality. We provide detailed implementation, and performance evaluations of this learning controller in a cart-pole balancing problem.","PeriodicalId":300130,"journal":{"name":"Proceedings of the 1999 IEEE International Symposium on Intelligent Control Intelligent Systems and Semiotics (Cat. No.99CH37014)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1999 IEEE International Symposium on Intelligent Control Intelligent Systems and Semiotics (Cat. No.99CH37014)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISIC.1999.796641","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper introduces a real-time learning control mechanism, as a robust and efficient scheme of neuro-dynamic programming. The objective of the learning controller is to optimize a certain performance measure by learning to create appropriate control actions through interacting with the environment. The controller is set out to learn to perform better over time starting with no prior knowledge about the system. The system under consideration does not render a complete system model describing its behaviors. Instead, real-time sampled measurements are available to the designer. The state measurements are first analyzed by similarity and organized by proximity. Control actions are then generated in relevance to the state patterns. A critic network serves the purpose of 'monitoring' the performance of the controller to achieve a given optimality. We provide detailed implementation, and performance evaluations of this learning controller in a cart-pole balancing problem.