基于多值决策图的高效增量计划和学习

Q1 Mathematics

Journal of Applied Logic Pub Date : 2017-07-01 DOI:10.1016/j.jal.2016.11.032

Jean-Christophe Magnan, Pierre-Henri Wuillemin

{"title":"基于多值决策图的高效增量计划和学习","authors":"Jean-Christophe Magnan, Pierre-Henri Wuillemin","doi":"10.1016/j.jal.2016.11.032","DOIUrl":null,"url":null,"abstract":"<div>In the domain of decision theoretic planning, the factored framework (Factored Markov Decision Process, fmdp) has produced optimized algorithms using structured representations such as Decision Trees (Structured Value Iteration (svi), Structured Policy Iteration (spi)) or Algebraic Decision Diagrams (Stochastic Planning Using Decision Diagrams (spudd)). Since it may be difficult to elaborate the factored models used by these algorithms, the architecture sdyna, which combines learning and planning algorithms using structured representations, was introduced. However, the state-of-the-art algorithms for incremental learning, for structured decision theoretic planning or for reinforcement learning require the problem to be specified only with binary variables and/or use data structures that can be improved in term of compactness. In this paper, we propose to use Multi-Valued Decision Diagrams (mdds) as a more efficient data structure for the sdyna architecture and describe a planning algorithm and an incremental learning algorithm dedicated to this new structured representation. For both planning and learning algorithms, we experimentally show that they allow significant improvements in time, in compactness of the computed policy and of the learned model. We then analyzed the combination of these two algorithms in an efficient sdyna instance for simultaneous learning and planning using mdds.</div>","PeriodicalId":54881,"journal":{"name":"Journal of Applied Logic","volume":"22 ","pages":"Pages 63-90"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.jal.2016.11.032","citationCount":"1","resultStr":"{\"title\":\"Efficient incremental planning and learning with multi-valued decision diagrams\",\"authors\":\"Jean-Christophe Magnan, Pierre-Henri Wuillemin\",\"doi\":\"10.1016/j.jal.2016.11.032\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>In the domain of decision theoretic planning, the factored framework (Factored Markov Decision Process, fmdp) has produced optimized algorithms using structured representations such as Decision Trees (Structured Value Iteration (svi), Structured Policy Iteration (spi)) or Algebraic Decision Diagrams (Stochastic Planning Using Decision Diagrams (spudd)). Since it may be difficult to elaborate the factored models used by these algorithms, the architecture sdyna, which combines learning and planning algorithms using structured representations, was introduced. However, the state-of-the-art algorithms for incremental learning, for structured decision theoretic planning or for reinforcement learning require the problem to be specified only with binary variables and/or use data structures that can be improved in term of compactness. In this paper, we propose to use Multi-Valued Decision Diagrams (mdds) as a more efficient data structure for the sdyna architecture and describe a planning algorithm and an incremental learning algorithm dedicated to this new structured representation. For both planning and learning algorithms, we experimentally show that they allow significant improvements in time, in compactness of the computed policy and of the learned model. We then analyzed the combination of these two algorithms in an efficient sdyna instance for simultaneous learning and planning using mdds.</div>\",\"PeriodicalId\":54881,\"journal\":{\"name\":\"Journal of Applied Logic\",\"volume\":\"22 \",\"pages\":\"Pages 63-90\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/j.jal.2016.11.032\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Applied Logic\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S157086831630091X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Logic","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S157086831630091X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 1

摘要

在决策理论规划领域，因式框架(因式马尔可夫决策过程，fmdp)使用结构化表示产生优化算法，如决策树(结构化值迭代(svi)，结构化策略迭代(spi))或代数决策图(使用决策图的随机规划(spudd))。由于这些算法使用的因子模型可能难以详细描述，因此引入了使用结构化表示将学习和规划算法结合在一起的架构sdyna。然而，用于增量学习、结构化决策理论规划或强化学习的最先进算法要求只使用二进制变量和/或使用可以在紧凑性方面得到改进的数据结构来指定问题。在本文中，我们建议使用多值决策图(mdd)作为sdyna体系结构的更有效的数据结构，并描述了一种专门用于这种新的结构化表示的规划算法和增量学习算法。对于规划和学习算法，我们的实验表明，它们在时间、计算策略和学习模型的紧凑性方面都有显著的改进。然后，我们在一个高效的sdyna实例中分析了这两种算法的组合，以使用mdd进行同步学习和规划。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient incremental planning and learning with multi-valued decision diagrams

In the domain of decision theoretic planning, the factored framework (Factored Markov Decision Process, fmdp) has produced optimized algorithms using structured representations such as Decision Trees (Structured Value Iteration (svi), Structured Policy Iteration (spi)) or Algebraic Decision Diagrams (Stochastic Planning Using Decision Diagrams (spudd)). Since it may be difficult to elaborate the factored models used by these algorithms, the architecture sdyna, which combines learning and planning algorithms using structured representations, was introduced. However, the state-of-the-art algorithms for incremental learning, for structured decision theoretic planning or for reinforcement learning require the problem to be specified only with binary variables and/or use data structures that can be improved in term of compactness. In this paper, we propose to use Multi-Valued Decision Diagrams (mdds) as a more efficient data structure for the sdyna architecture and describe a planning algorithm and an incremental learning algorithm dedicated to this new structured representation. For both planning and learning algorithms, we experimentally show that they allow significant improvements in time, in compactness of the computed policy and of the learned model. We then analyzed the combination of these two algorithms in an efficient sdyna instance for simultaneous learning and planning using mdds.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊