{"title":"改进的ADP控制的稳定性标准,用于高效的上下文感知决策支持系统","authors":"Yury Sokolov, R. Kozma","doi":"10.1109/ICAWST.2013.6765406","DOIUrl":null,"url":null,"abstract":"This paper addresses the issue of stability of approximate dynamic programming (ADP) in various sequential decision making problems, including intelligent control. We employ an ADP control algorithm that iteratively improves an internal model of the external world in the autonomous system based on its continuous interaction with the environment. Through the incremental learning process, the system becomes aware of the consequences of its action into the world. We extend previous results on stability of the ADP control to the case of general multi-layer neural network approximators. We demonstrate the benefit of our results in the control of various systems, including the cart pole balancing problem. Our results show significantly improved learning and control performance as compared to the state-of-art.","PeriodicalId":68697,"journal":{"name":"炎黄地理","volume":"2 1","pages":"41-47"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Improved stability criteria of ADP control for efficient context-aware decision support systems\",\"authors\":\"Yury Sokolov, R. Kozma\",\"doi\":\"10.1109/ICAWST.2013.6765406\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper addresses the issue of stability of approximate dynamic programming (ADP) in various sequential decision making problems, including intelligent control. We employ an ADP control algorithm that iteratively improves an internal model of the external world in the autonomous system based on its continuous interaction with the environment. Through the incremental learning process, the system becomes aware of the consequences of its action into the world. We extend previous results on stability of the ADP control to the case of general multi-layer neural network approximators. We demonstrate the benefit of our results in the control of various systems, including the cart pole balancing problem. Our results show significantly improved learning and control performance as compared to the state-of-art.\",\"PeriodicalId\":68697,\"journal\":{\"name\":\"炎黄地理\",\"volume\":\"2 1\",\"pages\":\"41-47\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"炎黄地理\",\"FirstCategoryId\":\"1089\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAWST.2013.6765406\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"炎黄地理","FirstCategoryId":"1089","ListUrlMain":"https://doi.org/10.1109/ICAWST.2013.6765406","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improved stability criteria of ADP control for efficient context-aware decision support systems
This paper addresses the issue of stability of approximate dynamic programming (ADP) in various sequential decision making problems, including intelligent control. We employ an ADP control algorithm that iteratively improves an internal model of the external world in the autonomous system based on its continuous interaction with the environment. Through the incremental learning process, the system becomes aware of the consequences of its action into the world. We extend previous results on stability of the ADP control to the case of general multi-layer neural network approximators. We demonstrate the benefit of our results in the control of various systems, including the cart pole balancing problem. Our results show significantly improved learning and control performance as compared to the state-of-art.