罪犯与罪犯之间的安全游戏

IF 10.7 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Annual Reviews in Control Pub Date : 2024-01-01 DOI:10.1016/j.arcontrol.2024.100939

Miroslav Krstic

{"title":"罪犯与罪犯之间的安全游戏","authors":"Miroslav Krstic","doi":"10.1016/j.arcontrol.2024.100939","DOIUrl":null,"url":null,"abstract":"<div><p>In this tutorial we study a safety analog of the classical zero-sum differential game with positive definite penalties on the state and the two inputs. Consider a nonlinear system affine in two inputs, which are called “offender” and “defender.” Let the inputs have the opposing objectives in relation to an infinite-time cost which, in addition to penalizing the inputs of both agents, incorporates a safety index of the system (a barrier function), with the defender aiming to maximize the system safety and the offender aiming to minimize it. If there is a pair of (offender, defender) non-Nash feedback policies of the <span><math><mrow><msub><mrow><mi>L</mi></mrow><mrow><mi>g</mi></mrow></msub><mi>h</mi></mrow></math></span> form with a safe outcome, namely, where the defender maintains safety while the offender fails to violate safety, then there exists an inverse optimal pair of policies that attain a Nash equilibrium relative to the safety minimax objective. In the tutorial we study both deterministic and stochastic offenders. The deterministic offender applies its feedback through its deterministic input value, while the stochastic offender applies its feedback through its incremental covariance. In addition to Nash policies for a minimax offender–defender formulation, we provide feedback laws for the defender, in the scenario where the offender action is unrestricted by optimality, and where the defender ensures input-to-state safety in the deterministic and stochastic senses. This tutorial is derived from our recent article on inverse optimal safety filters, by setting the nominal control to zero and declaring the disturbance to be the offender agent.</p><p>Among several illustrative examples, one is particularly interesting and unconventional. We consider a safety game played on a unicycle vehicle between its two inputs: the angular velocity and the linear velocity, as the opposing players. We consider two scenarios. In the first, the angular velocity, acting as an offender, attempts to run the vehicle into an obstacle by steering, while the linear velocity, acting as a defender, drives the vehicle forward or in reverse to prevent the vehicle being run into the obstacle. In the second scenario, the linear velocity acts as an offender and angular velocity acts as a defender (in the deterministic case by varying the heading rate; in the stochastic case by varying the variance of a white noise driving the heading rate). A “wind” towards the obstacle advantages the offender in both scenarios. The input policies derived are optimal in the sense of their opposite objectives, under the best possible policy of the opponent, under meaningful costs on their actions. The linear velocity input prevails, whether acting in the role of a defender, in which case the collision with the obstacle is prevented, or in the role of an offender, in which case the collision with the obstacle is achieved.</p></div>","PeriodicalId":50750,"journal":{"name":"Annual Reviews in Control","volume":"57 ","pages":"Article 100939"},"PeriodicalIF":10.7000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1367578824000087/pdfft?md5=3d4c0e415f10642f5626c050ea707e6a&pid=1-s2.0-S1367578824000087-main.pdf","citationCount":"0","resultStr":"{\"title\":\"An offender–defender safety game\",\"authors\":\"Miroslav Krstic\",\"doi\":\"10.1016/j.arcontrol.2024.100939\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In this tutorial we study a safety analog of the classical zero-sum differential game with positive definite penalties on the state and the two inputs. Consider a nonlinear system affine in two inputs, which are called “offender” and “defender.” Let the inputs have the opposing objectives in relation to an infinite-time cost which, in addition to penalizing the inputs of both agents, incorporates a safety index of the system (a barrier function), with the defender aiming to maximize the system safety and the offender aiming to minimize it. If there is a pair of (offender, defender) non-Nash feedback policies of the <span><math><mrow><msub><mrow><mi>L</mi></mrow><mrow><mi>g</mi></mrow></msub><mi>h</mi></mrow></math></span> form with a safe outcome, namely, where the defender maintains safety while the offender fails to violate safety, then there exists an inverse optimal pair of policies that attain a Nash equilibrium relative to the safety minimax objective. In the tutorial we study both deterministic and stochastic offenders. The deterministic offender applies its feedback through its deterministic input value, while the stochastic offender applies its feedback through its incremental covariance. In addition to Nash policies for a minimax offender–defender formulation, we provide feedback laws for the defender, in the scenario where the offender action is unrestricted by optimality, and where the defender ensures input-to-state safety in the deterministic and stochastic senses. This tutorial is derived from our recent article on inverse optimal safety filters, by setting the nominal control to zero and declaring the disturbance to be the offender agent.</p><p>Among several illustrative examples, one is particularly interesting and unconventional. We consider a safety game played on a unicycle vehicle between its two inputs: the angular velocity and the linear velocity, as the opposing players. We consider two scenarios. In the first, the angular velocity, acting as an offender, attempts to run the vehicle into an obstacle by steering, while the linear velocity, acting as a defender, drives the vehicle forward or in reverse to prevent the vehicle being run into the obstacle. In the second scenario, the linear velocity acts as an offender and angular velocity acts as a defender (in the deterministic case by varying the heading rate; in the stochastic case by varying the variance of a white noise driving the heading rate). A “wind” towards the obstacle advantages the offender in both scenarios. The input policies derived are optimal in the sense of their opposite objectives, under the best possible policy of the opponent, under meaningful costs on their actions. The linear velocity input prevails, whether acting in the role of a defender, in which case the collision with the obstacle is prevented, or in the role of an offender, in which case the collision with the obstacle is achieved.</p></div>\",\"PeriodicalId\":50750,\"journal\":{\"name\":\"Annual Reviews in Control\",\"volume\":\"57 \",\"pages\":\"Article 100939\"},\"PeriodicalIF\":10.7000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1367578824000087/pdfft?md5=3d4c0e415f10642f5626c050ea707e6a&pid=1-s2.0-S1367578824000087-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Reviews in Control\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1367578824000087\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Reviews in Control","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1367578824000087","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

在本教程中，我们将研究经典零和微分博弈的安全类似问题，该博弈对状态和两个输入具有正定的惩罚。考虑一个非线性系统，它有两个输入，分别称为 "进攻方 "和 "防守方"。除了对两个代理的输入进行惩罚外，该成本还包括系统的安全指数（障碍函数），防御方的目标是最大化系统安全，而进攻方的目标是最小化系统安全。如果存在一对 Lgh 形式的（犯罪者、防御者）非纳什反馈策略，其结果是安全的，即防御者保持安全，而犯罪者不违反安全，那么就存在一对反向最优策略，相对于安全最小目标而言，这对策略达到了纳什均衡。在教程中，我们同时研究了确定性和随机性罪犯。确定性违规者通过其确定性输入值进行反馈，而随机违规者则通过其增量协方差进行反馈。除了最小犯罪者-防御者表述的纳什策略外，我们还提供了防御者的反馈定律，在这种情况下，犯罪者的行动不受最优性的限制，防御者确保确定性和随机性意义上的输入-状态安全。本教程源于我们最近发表的一篇关于逆最优安全滤波器的文章，方法是将名义控制设为零，并将干扰宣布为犯罪代理。在几个示例中，有一个特别有趣且非传统的例子。我们将独轮车上的两个输入（角速度和线速度）视为对立双方，进行安全博弈。我们考虑了两种情况。在第一种情况下，角速度作为进攻方，试图通过转向将车辆撞向障碍物，而线速度作为防守方，则驾驶车辆前进或后退，以防止车辆撞向障碍物。在第二种情况下，线速度充当攻击者，角速度充当防御者（在确定情况下，通过改变航向率；在随机情况下，通过改变驱动航向率的白噪声的方差）。在这两种情况下，冲向障碍物的 "风 "都会对违规者有利。从目标相反的意义上讲，在对手可能采取的最佳策略下，在其行动付出有意义的代价后，所得出的输入策略都是最优的。无论是作为防御者（防止与障碍物碰撞），还是作为进攻者（实现与障碍物碰撞），线性速度输入都是最优的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An offender–defender safety game

In this tutorial we study a safety analog of the classical zero-sum differential game with positive definite penalties on the state and the two inputs. Consider a nonlinear system affine in two inputs, which are called “offender” and “defender.” Let the inputs have the opposing objectives in relation to an infinite-time cost which, in addition to penalizing the inputs of both agents, incorporates a safety index of the system (a barrier function), with the defender aiming to maximize the system safety and the offender aiming to minimize it. If there is a pair of (offender, defender) non-Nash feedback policies of the $L_{g} h$ form with a safe outcome, namely, where the defender maintains safety while the offender fails to violate safety, then there exists an inverse optimal pair of policies that attain a Nash equilibrium relative to the safety minimax objective. In the tutorial we study both deterministic and stochastic offenders. The deterministic offender applies its feedback through its deterministic input value, while the stochastic offender applies its feedback through its incremental covariance. In addition to Nash policies for a minimax offender–defender formulation, we provide feedback laws for the defender, in the scenario where the offender action is unrestricted by optimality, and where the defender ensures input-to-state safety in the deterministic and stochastic senses. This tutorial is derived from our recent article on inverse optimal safety filters, by setting the nominal control to zero and declaring the disturbance to be the offender agent.

Among several illustrative examples, one is particularly interesting and unconventional. We consider a safety game played on a unicycle vehicle between its two inputs: the angular velocity and the linear velocity, as the opposing players. We consider two scenarios. In the first, the angular velocity, acting as an offender, attempts to run the vehicle into an obstacle by steering, while the linear velocity, acting as a defender, drives the vehicle forward or in reverse to prevent the vehicle being run into the obstacle. In the second scenario, the linear velocity acts as an offender and angular velocity acts as a defender (in the deterministic case by varying the heading rate; in the stochastic case by varying the variance of a white noise driving the heading rate). A “wind” towards the obstacle advantages the offender in both scenarios. The input policies derived are optimal in the sense of their opposite objectives, under the best possible policy of the opponent, under meaningful costs on their actions. The linear velocity input prevails, whether acting in the role of a defender, in which case the collision with the obstacle is prevented, or in the role of an offender, in which case the collision with the obstacle is achieved.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annual Reviews in Control 工程技术-自动化与控制系统

CiteScore

19.00

自引率

2.10%

发文量

审稿时长

36 days

期刊介绍： The field of Control is changing very fast now with technology-driven “societal grand challenges” and with the deployment of new digital technologies. The aim of Annual Reviews in Control is to provide comprehensive and visionary views of the field of Control, by publishing the following types of review articles: Survey Article: Review papers on main methodologies or technical advances adding considerable technical value to the state of the art. Note that papers which purely rely on mechanistic searches and lack comprehensive analysis providing a clear contribution to the field will be rejected. Vision Article: Cutting-edge and emerging topics with visionary perspective on the future of the field or how it will bridge multiple disciplines, and Tutorial research Article: Fundamental guides for future studies.