具有无限视界的可微分规划

Anais do X Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2022) Pub Date : 2022-11-28 DOI:10.5753/kdmile.2022.227974

Daniel B. Dias, Leliane N. de Barros, Karina V. Delgado, D. Mauá

{"title":"具有无限视界的可微分规划","authors":"Daniel B. Dias, Leliane N. de Barros, Karina V. Delgado, D. Mauá","doi":"10.5753/kdmile.2022.227974","DOIUrl":null,"url":null,"abstract":"With the recent advances in automated planning based on deep-learning techniques, Deep Reactive Policies (DRPs) have been shown as a powerful framework to solve Markov Decision Processes (MDPs) with a certain degree of complexity, like MDPs with continuous action-state spaces and exogenous events. Some differentiable planning algorithms can learn these policies through policy-gradient techniques considering a finite horizon MDP. However, for certain domains, we do not know the ideal size of the horizon needed to find an optimal solution, even when we have a planning goal description, that can either be a simple reachability goal or a complex goal involving path optimization. This work aims to solve a continuous MDP through differentiable planning, considering the problem horizon as a hyperparameter that can be adjusted for a DRP training process. This preliminary investigation show that it is possible to find better policies by choosing a horizon that encompasses the planning goal.","PeriodicalId":417100,"journal":{"name":"Anais do X Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2022)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Differentiable Planning with Indefinite Horizon\",\"authors\":\"Daniel B. Dias, Leliane N. de Barros, Karina V. Delgado, D. Mauá\",\"doi\":\"10.5753/kdmile.2022.227974\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the recent advances in automated planning based on deep-learning techniques, Deep Reactive Policies (DRPs) have been shown as a powerful framework to solve Markov Decision Processes (MDPs) with a certain degree of complexity, like MDPs with continuous action-state spaces and exogenous events. Some differentiable planning algorithms can learn these policies through policy-gradient techniques considering a finite horizon MDP. However, for certain domains, we do not know the ideal size of the horizon needed to find an optimal solution, even when we have a planning goal description, that can either be a simple reachability goal or a complex goal involving path optimization. This work aims to solve a continuous MDP through differentiable planning, considering the problem horizon as a hyperparameter that can be adjusted for a DRP training process. This preliminary investigation show that it is possible to find better policies by choosing a horizon that encompasses the planning goal.\",\"PeriodicalId\":417100,\"journal\":{\"name\":\"Anais do X Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2022)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anais do X Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2022)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5753/kdmile.2022.227974\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do X Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2022)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/kdmile.2022.227974","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着基于深度学习技术的自动化规划的最新进展，深度反应策略(Deep Reactive Policies, DRPs)已被证明是一个强大的框架，可以解决具有一定复杂性的马尔可夫决策过程(mdp)，如具有连续动作状态空间和外生事件的mdp。一些可微规划算法可以通过考虑有限视界MDP的策略梯度技术来学习这些策略。然而，对于某些领域，我们并不知道找到最优解决方案所需的理想视界大小，即使我们有一个规划目标描述，它可以是一个简单的可达性目标，也可以是一个涉及路径优化的复杂目标。这项工作的目的是通过可微分规划来解决连续的MDP，将问题视界视为可以为DRP训练过程调整的超参数。这项初步调查表明，通过选择一个包含规划目标的范围，有可能找到更好的政策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Differentiable Planning with Indefinite Horizon

With the recent advances in automated planning based on deep-learning techniques, Deep Reactive Policies (DRPs) have been shown as a powerful framework to solve Markov Decision Processes (MDPs) with a certain degree of complexity, like MDPs with continuous action-state spaces and exogenous events. Some differentiable planning algorithms can learn these policies through policy-gradient techniques considering a finite horizon MDP. However, for certain domains, we do not know the ideal size of the horizon needed to find an optimal solution, even when we have a planning goal description, that can either be a simple reachability goal or a complex goal involving path optimization. This work aims to solve a continuous MDP through differentiable planning, considering the problem horizon as a hyperparameter that can be adjusted for a DRP training process. This preliminary investigation show that it is possible to find better policies by choosing a horizon that encompasses the planning goal.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Anais do X Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2022)

自引率

0.00%

发文量