{"title":"近似动态规划","authors":"Václav Šmídl","doi":"10.1201/b10384-137","DOIUrl":null,"url":null,"abstract":"Approximate Dynamic Programming (ADP), also sometimes referred to as neuro-dynamic programming, attempts to overcome some of the limitations of value iteration. Mainly, it is too expensive to compute and store the entire value function, when the state space is large (e.g., Tetris). Furthermore, a strong access to the model is required to reconstruct the optimal policy from the value function. To address these problems, there are three approximations one could make: 1. Approximate the optimal policy 2. Approximate the value function V 3. Approximately satisfy the Bellman equation","PeriodicalId":131575,"journal":{"name":"The Control Systems Handbook","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Approximate Dynamic Programming\",\"authors\":\"Václav Šmídl\",\"doi\":\"10.1201/b10384-137\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Approximate Dynamic Programming (ADP), also sometimes referred to as neuro-dynamic programming, attempts to overcome some of the limitations of value iteration. Mainly, it is too expensive to compute and store the entire value function, when the state space is large (e.g., Tetris). Furthermore, a strong access to the model is required to reconstruct the optimal policy from the value function. To address these problems, there are three approximations one could make: 1. Approximate the optimal policy 2. Approximate the value function V 3. Approximately satisfy the Bellman equation\",\"PeriodicalId\":131575,\"journal\":{\"name\":\"The Control Systems Handbook\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Control Systems Handbook\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1201/b10384-137\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Control Systems Handbook","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1201/b10384-137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Approximate Dynamic Programming (ADP), also sometimes referred to as neuro-dynamic programming, attempts to overcome some of the limitations of value iteration. Mainly, it is too expensive to compute and store the entire value function, when the state space is large (e.g., Tetris). Furthermore, a strong access to the model is required to reconstruct the optimal policy from the value function. To address these problems, there are three approximations one could make: 1. Approximate the optimal policy 2. Approximate the value function V 3. Approximately satisfy the Bellman equation