贝叶斯神经网络的概率避障

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Pub Date : 2024-04-17 DOI:10.1016/j.artint.2024.104132

Matthew Wicker , Luca Laurenti , Andrea Patane , Nicola Paoletti , Alessandro Abate , Marta Kwiatkowska

{"title":"贝叶斯神经网络的概率避障","authors":"Matthew Wicker , Luca Laurenti , Andrea Patane , Nicola Paoletti , Alessandro Abate , Marta Kwiatkowska","doi":"10.1016/j.artint.2024.104132","DOIUrl":null,"url":null,"abstract":"<div><p>Model-based reinforcement learning seeks to simultaneously learn the dynamics of an unknown stochastic environment and synthesise an optimal policy for acting in it. Ensuring the safety and robustness of sequential decisions made through a policy in such an environment is a key challenge for policies intended for safety-critical scenarios. In this work, we investigate two complementary problems: first, computing reach-avoid probabilities for iterative predictions made with dynamical models, with dynamics described by Bayesian neural network (BNN); second, synthesising control policies that are optimal with respect to a given reach-avoid specification (reaching a “target” state, while avoiding a set of “unsafe” states) and a learned BNN model. Our solution leverages interval propagation and backward recursion techniques to compute lower bounds for the probability that a policy's sequence of actions leads to satisfying the reach-avoid specification. Such computed lower bounds provide safety certification for the given policy and BNN model. We then introduce control synthesis algorithms to derive policies maximizing said lower bounds on the safety probability. We demonstrate the effectiveness of our method on a series of control benchmarks characterized by learned BNN dynamics models. On our most challenging benchmark, compared to purely data-driven policies the optimal synthesis algorithm is able to provide more than a four-fold increase in the number of certifiable states and more than a three-fold increase in the average guaranteed reach-avoid probability.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"334 ","pages":"Article 104132"},"PeriodicalIF":5.1000,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Probabilistic reach-avoid for Bayesian neural networks\",\"authors\":\"Matthew Wicker , Luca Laurenti , Andrea Patane , Nicola Paoletti , Alessandro Abate , Marta Kwiatkowska\",\"doi\":\"10.1016/j.artint.2024.104132\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Model-based reinforcement learning seeks to simultaneously learn the dynamics of an unknown stochastic environment and synthesise an optimal policy for acting in it. Ensuring the safety and robustness of sequential decisions made through a policy in such an environment is a key challenge for policies intended for safety-critical scenarios. In this work, we investigate two complementary problems: first, computing reach-avoid probabilities for iterative predictions made with dynamical models, with dynamics described by Bayesian neural network (BNN); second, synthesising control policies that are optimal with respect to a given reach-avoid specification (reaching a “target” state, while avoiding a set of “unsafe” states) and a learned BNN model. Our solution leverages interval propagation and backward recursion techniques to compute lower bounds for the probability that a policy's sequence of actions leads to satisfying the reach-avoid specification. Such computed lower bounds provide safety certification for the given policy and BNN model. We then introduce control synthesis algorithms to derive policies maximizing said lower bounds on the safety probability. We demonstrate the effectiveness of our method on a series of control benchmarks characterized by learned BNN dynamics models. On our most challenging benchmark, compared to purely data-driven policies the optimal synthesis algorithm is able to provide more than a four-fold increase in the number of certifiable states and more than a three-fold increase in the average guaranteed reach-avoid probability.</p></div>\",\"PeriodicalId\":8434,\"journal\":{\"name\":\"Artificial Intelligence\",\"volume\":\"334 \",\"pages\":\"Article 104132\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2024-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0004370224000687\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370224000687","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

基于模型的强化学习旨在同时学习未知随机环境的动态，并为在该环境中的行动合成最佳政策。确保在这种环境中通过政策做出的连续决策的安全性和稳健性，是针对安全关键场景的政策所面临的主要挑战。在这项工作中，我们研究了两个相辅相成的问题：第一，计算使用贝叶斯神经网络（BNN）描述动态的动态模型迭代预测的到达-避开概率；第二，合成与给定的到达-避开规范（到达 "目标 "状态，同时避开一组 "不安全 "状态）和学习的 BNN 模型相关的最优控制策略。我们的解决方案利用区间传播和后向递归技术，计算出政策行动序列满足到达-避免规范的概率下限。这些计算出的下限为给定的策略和 BNN 模型提供了安全认证。然后，我们引入控制合成算法，推导出使上述安全概率下限最大化的策略。我们在一系列以学习到的 BNN 动态模型为特征的控制基准上证明了我们方法的有效性。在我们最具挑战性的基准上，与纯粹的数据驱动策略相比，最优合成算法能够将可认证状态的数量提高四倍以上，并将平均保证到达-避免概率提高三倍以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Probabilistic reach-avoid for Bayesian neural networks

Model-based reinforcement learning seeks to simultaneously learn the dynamics of an unknown stochastic environment and synthesise an optimal policy for acting in it. Ensuring the safety and robustness of sequential decisions made through a policy in such an environment is a key challenge for policies intended for safety-critical scenarios. In this work, we investigate two complementary problems: first, computing reach-avoid probabilities for iterative predictions made with dynamical models, with dynamics described by Bayesian neural network (BNN); second, synthesising control policies that are optimal with respect to a given reach-avoid specification (reaching a “target” state, while avoiding a set of “unsafe” states) and a learned BNN model. Our solution leverages interval propagation and backward recursion techniques to compute lower bounds for the probability that a policy's sequence of actions leads to satisfying the reach-avoid specification. Such computed lower bounds provide safety certification for the given policy and BNN model. We then introduce control synthesis algorithms to derive policies maximizing said lower bounds on the safety probability. We demonstrate the effectiveness of our method on a series of control benchmarks characterized by learned BNN dynamics models. On our most challenging benchmark, compared to purely data-driven policies the optimal synthesis algorithm is able to provide more than a four-fold increase in the number of certifiable states and more than a three-fold increase in the average guaranteed reach-avoid probability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Artificial Intelligence 工程技术-计算机：人工智能

CiteScore

11.20

自引率

1.40%

发文量

118

审稿时长

8 months

期刊介绍： The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.