Differentiable Planning with Indefinite Horizon

Anais do X Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2022) Pub Date : 2022-11-28 DOI:10.5753/kdmile.2022.227974

Daniel B. Dias, Leliane N. de Barros, Karina V. Delgado, D. Mauá

引用次数: 0

Abstract

With the recent advances in automated planning based on deep-learning techniques, Deep Reactive Policies (DRPs) have been shown as a powerful framework to solve Markov Decision Processes (MDPs) with a certain degree of complexity, like MDPs with continuous action-state spaces and exogenous events. Some differentiable planning algorithms can learn these policies through policy-gradient techniques considering a finite horizon MDP. However, for certain domains, we do not know the ideal size of the horizon needed to find an optimal solution, even when we have a planning goal description, that can either be a simple reachability goal or a complex goal involving path optimization. This work aims to solve a continuous MDP through differentiable planning, considering the problem horizon as a hyperparameter that can be adjusted for a DRP training process. This preliminary investigation show that it is possible to find better policies by choosing a horizon that encompasses the planning goal.

查看原文本刊更多论文

具有无限视界的可微分规划

随着基于深度学习技术的自动化规划的最新进展，深度反应策略(Deep Reactive Policies, DRPs)已被证明是一个强大的框架，可以解决具有一定复杂性的马尔可夫决策过程(mdp)，如具有连续动作状态空间和外生事件的mdp。一些可微规划算法可以通过考虑有限视界MDP的策略梯度技术来学习这些策略。然而，对于某些领域，我们并不知道找到最优解决方案所需的理想视界大小，即使我们有一个规划目标描述，它可以是一个简单的可达性目标，也可以是一个涉及路径优化的复杂目标。这项工作的目的是通过可微分规划来解决连续的MDP，将问题视界视为可以为DRP训练过程调整的超参数。这项初步调查表明，通过选择一个包含规划目标的范围，有可能找到更好的政策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Anais do X Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2022)

自引率

0.00%

发文量