Monte Carlo Tree Search Algorithm for SSPs Under the GUBS Criterion

Q4 Mathematics

CLEI Electronic Journal Pub Date : 2024-08-08 DOI:10.19153/cleiej.27.3.5

Gabriel Nunes Crispino, Valdinei Freire, Karina Valdivia Delgado

{"title":"Monte Carlo Tree Search Algorithm for SSPs Under the GUBS Criterion","authors":"Gabriel Nunes Crispino, Valdinei Freire, Karina Valdivia Delgado","doi":"10.19153/cleiej.27.3.5","DOIUrl":null,"url":null,"abstract":"The Stochastic Shortest Path (SSP) is a formalism widely used for modeling goal-oriented probabilistic planning problems. When dead ends, which are states from which goal states cannot be reached, are present in the problem and cannot be avoided, the standard criterion for solving SSPs is not well defined in these scenarios. Because of that, several alternate criteria for solving SSPs with unavoidable dead ends have been proposed in the literature. One of these criteria is GUBS (Goals with Utility-Based Semantics), a criterion that makes trade-offs between probability-to-goal and cost by combining goal prioritization with Expected Utility Theory. GUBS is a good choice for these problems because it is one of the only criteria that are known to maintain the ?-strong probability-to-goal priority property, a property that provides guarantees on how a decision criterion can choose policies without having to preprocess any specific SSP problem. Although there already exist two exact algorithms for solving GUBS, eGUBS-VI and eGUBS-AO*, both are offline and there is no algorithm for solving GUBS in an online manner. In this paper we propose UCT-GUBS, an online approximate algorithm based on UCT (a Monte Carlo tree search algorithm) that solves SSPs under the GUBS criterion. We provide an analysis of an empirical evaluation performed on two probabilistic planning domains (Triangle Tireworld and Navigation) to observe how the probability-to-goal and utility values of the resulting policies compare to the optimal values, and also how the time performance of UCT-GUBS compares to the ones of eGUBS-VI and eGUBS-AO*. Our conclusion is that, like other algorithms, the usage of UCT-GUBS has to be evaluated considering the application requirements and of the problem being solved. Depending on these factors, it can be a good alternative for obtaining policies in an online fashion while, for some problems, also being able to have better time performance than other algorithms","PeriodicalId":30032,"journal":{"name":"CLEI Electronic Journal","volume":"106 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CLEI Electronic Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.19153/cleiej.27.3.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 0

Abstract

The Stochastic Shortest Path (SSP) is a formalism widely used for modeling goal-oriented probabilistic planning problems. When dead ends, which are states from which goal states cannot be reached, are present in the problem and cannot be avoided, the standard criterion for solving SSPs is not well defined in these scenarios. Because of that, several alternate criteria for solving SSPs with unavoidable dead ends have been proposed in the literature. One of these criteria is GUBS (Goals with Utility-Based Semantics), a criterion that makes trade-offs between probability-to-goal and cost by combining goal prioritization with Expected Utility Theory. GUBS is a good choice for these problems because it is one of the only criteria that are known to maintain the ?-strong probability-to-goal priority property, a property that provides guarantees on how a decision criterion can choose policies without having to preprocess any specific SSP problem. Although there already exist two exact algorithms for solving GUBS, eGUBS-VI and eGUBS-AO*, both are offline and there is no algorithm for solving GUBS in an online manner. In this paper we propose UCT-GUBS, an online approximate algorithm based on UCT (a Monte Carlo tree search algorithm) that solves SSPs under the GUBS criterion. We provide an analysis of an empirical evaluation performed on two probabilistic planning domains (Triangle Tireworld and Navigation) to observe how the probability-to-goal and utility values of the resulting policies compare to the optimal values, and also how the time performance of UCT-GUBS compares to the ones of eGUBS-VI and eGUBS-AO*. Our conclusion is that, like other algorithms, the usage of UCT-GUBS has to be evaluated considering the application requirements and of the problem being solved. Depending on these factors, it can be a good alternative for obtaining policies in an online fashion while, for some problems, also being able to have better time performance than other algorithms

查看原文本刊更多论文

GUBS 准则下的 SSP 蒙特卡洛树搜索算法

随机最短路径（SSP）是一种被广泛用于模拟面向目标的概率规划问题的形式主义。当问题中出现死胡同（即无法到达目标状态的状态）且无法避免时，解决 SSP 的标准准则就无法很好地定义这些情况。因此，文献中提出了几种解决具有不可避免的死胡同的 SSP 的替代标准。其中一个标准是 GUBS（基于效用语义的目标），这是一种通过将目标优先级与期望效用理论相结合，在目标概率与成本之间进行权衡的标准。GUBS 是这些问题的理想选择，因为它是目前已知的唯一能保持 "强概率-目标优先 "特性的准则之一，这一特性为决策准则如何选择政策提供了保证，而无需预处理任何具体的 SSP 问题。虽然目前已经有两种精确求解 GUBS 的算法，即 eGUBS-VI 和 eGUBS-AO*，但这两种算法都是离线算法，还没有在线求解 GUBS 的算法。在本文中，我们提出了 UCT-GUBS，这是一种基于 UCT（蒙特卡罗树搜索算法）的在线近似算法，可在 GUBS 准则下求解 SSP。我们对两个概率规划域（三角轮胎世界和导航）进行了实证评估分析，以观察所得策略的目标概率和效用值与最优值的比较，以及 UCT-GUBS 的时间性能与 eGUBS-VI 和 eGUBS-AO* 的时间性能的比较。我们的结论是，与其他算法一样，在评估 UCT-GUBS 的使用时必须考虑应用要求和要解决的问题。根据这些因素，UCT-GUBS 可以成为在线获取策略的良好替代方案，而对于某些问题，UCT-GUBS 还能比其他算法具有更好的时间性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊