Probabilistic Model Checking of Stochastic Reinforcement Learning Policies

International Conference on Agents and Artificial Intelligence Pub Date : 2024-03-27 DOI:10.5220/0012357700003636

Dennis Gross, Helge Spieker

引用次数: 0

Abstract

We introduce a method to verify stochastic reinforcement learning (RL) policies. This approach is compatible with any RL algorithm as long as the algorithm and its corresponding environment collectively adhere to the Markov property. In this setting, the future state of the environment should depend solely on its current state and the action executed, independent of any previous states or actions. Our method integrates a verification technique, referred to as model checking, with RL, leveraging a Markov decision process, a trained RL policy, and a probabilistic computation tree logic (PCTL) formula to build a formal model that can be subsequently verified via the model checker Storm. We demonstrate our method's applicability across multiple benchmarks, comparing it to baseline methods called deterministic safety estimates and naive monolithic model checking. Our results show that our method is suited to verify stochastic RL policies.

查看原文本刊更多论文

随机强化学习策略的概率模型检查

我们介绍了一种验证随机强化学习（RL）策略的方法。只要算法及其相应的环境共同遵守马尔可夫特性，这种方法就能与任何 RL 算法兼容。在这种情况下，环境的未来状态应完全取决于其当前状态和所执行的操作，而与之前的任何状态或操作无关。我们的方法将一种被称为模型检查的验证技术与 RL 相结合，利用马尔可夫决策过程、训练有素的 RL 策略和概率计算树逻辑（PCTL）公式来构建一个正式模型，随后通过模型检查程序 Storm 对其进行验证。我们在多个基准中演示了我们的方法的适用性，并将其与称为确定性安全估计和天真整体模型检查的基线方法进行了比较。结果表明，我们的方法适用于验证随机 RL 策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference on Agents and Artificial Intelligence

CiteScore

0.60

自引率

0.00%

发文量