Daniel B. Dias, Leliane N. de Barros, Karina V. Delgado, D. Mauá
{"title":"Differentiable Planning with Indefinite Horizon","authors":"Daniel B. Dias, Leliane N. de Barros, Karina V. Delgado, D. Mauá","doi":"10.5753/kdmile.2022.227974","DOIUrl":null,"url":null,"abstract":"With the recent advances in automated planning based on deep-learning techniques, Deep Reactive Policies (DRPs) have been shown as a powerful framework to solve Markov Decision Processes (MDPs) with a certain degree of complexity, like MDPs with continuous action-state spaces and exogenous events. Some differentiable planning algorithms can learn these policies through policy-gradient techniques considering a finite horizon MDP. However, for certain domains, we do not know the ideal size of the horizon needed to find an optimal solution, even when we have a planning goal description, that can either be a simple reachability goal or a complex goal involving path optimization. This work aims to solve a continuous MDP through differentiable planning, considering the problem horizon as a hyperparameter that can be adjusted for a DRP training process. This preliminary investigation show that it is possible to find better policies by choosing a horizon that encompasses the planning goal.","PeriodicalId":417100,"journal":{"name":"Anais do X Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2022)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do X Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2022)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/kdmile.2022.227974","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the recent advances in automated planning based on deep-learning techniques, Deep Reactive Policies (DRPs) have been shown as a powerful framework to solve Markov Decision Processes (MDPs) with a certain degree of complexity, like MDPs with continuous action-state spaces and exogenous events. Some differentiable planning algorithms can learn these policies through policy-gradient techniques considering a finite horizon MDP. However, for certain domains, we do not know the ideal size of the horizon needed to find an optimal solution, even when we have a planning goal description, that can either be a simple reachability goal or a complex goal involving path optimization. This work aims to solve a continuous MDP through differentiable planning, considering the problem horizon as a hyperparameter that can be adjusted for a DRP training process. This preliminary investigation show that it is possible to find better policies by choosing a horizon that encompasses the planning goal.