Probabilistic Model Checking of Stochastic Reinforcement Learning Policies

Dennis Gross, Helge Spieker
{"title":"Probabilistic Model Checking of Stochastic Reinforcement Learning Policies","authors":"Dennis Gross, Helge Spieker","doi":"10.5220/0012357700003636","DOIUrl":null,"url":null,"abstract":"We introduce a method to verify stochastic reinforcement learning (RL) policies. This approach is compatible with any RL algorithm as long as the algorithm and its corresponding environment collectively adhere to the Markov property. In this setting, the future state of the environment should depend solely on its current state and the action executed, independent of any previous states or actions. Our method integrates a verification technique, referred to as model checking, with RL, leveraging a Markov decision process, a trained RL policy, and a probabilistic computation tree logic (PCTL) formula to build a formal model that can be subsequently verified via the model checker Storm. We demonstrate our method's applicability across multiple benchmarks, comparing it to baseline methods called deterministic safety estimates and naive monolithic model checking. Our results show that our method is suited to verify stochastic RL policies.","PeriodicalId":174978,"journal":{"name":"International Conference on Agents and Artificial Intelligence","volume":"18 19","pages":"438-445"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Agents and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0012357700003636","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We introduce a method to verify stochastic reinforcement learning (RL) policies. This approach is compatible with any RL algorithm as long as the algorithm and its corresponding environment collectively adhere to the Markov property. In this setting, the future state of the environment should depend solely on its current state and the action executed, independent of any previous states or actions. Our method integrates a verification technique, referred to as model checking, with RL, leveraging a Markov decision process, a trained RL policy, and a probabilistic computation tree logic (PCTL) formula to build a formal model that can be subsequently verified via the model checker Storm. We demonstrate our method's applicability across multiple benchmarks, comparing it to baseline methods called deterministic safety estimates and naive monolithic model checking. Our results show that our method is suited to verify stochastic RL policies.
随机强化学习策略的概率模型检查
我们介绍了一种验证随机强化学习(RL)策略的方法。只要算法及其相应的环境共同遵守马尔可夫特性,这种方法就能与任何 RL 算法兼容。在这种情况下,环境的未来状态应完全取决于其当前状态和所执行的操作,而与之前的任何状态或操作无关。我们的方法将一种被称为模型检查的验证技术与 RL 相结合,利用马尔可夫决策过程、训练有素的 RL 策略和概率计算树逻辑(PCTL)公式来构建一个正式模型,随后通过模型检查程序 Storm 对其进行验证。我们在多个基准中演示了我们的方法的适用性,并将其与称为确定性安全估计和天真整体模型检查的基线方法进行了比较。结果表明,我们的方法适用于验证随机 RL 策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
0.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信