序列武器目标分配的单调最优阈值反馈策略

J. Aerosp. Inf. Syst. Pub Date : 2017-02-15 DOI:10.2514/1.I010501

K. Krishnamoorthy, D. Casbeer, M. Pachter

{"title":"序列武器目标分配的单调最优阈值反馈策略","authors":"K. Krishnamoorthy, D. Casbeer, M. Pachter","doi":"10.2514/1.I010501","DOIUrl":null,"url":null,"abstract":"T HEoperational scenario is the following.A bomberwith identical weapons travels along a designated route/path and sequentially encounters enemy (ground) targets. Should the bomber decide to engage a target, the target will be destroyed with a probability of p < 1. Upon successful elimination, the bomber receives a positive reward r drawn from a fixed known distribution.We stipulate that, before engagement, the bomber observes the target and is made aware of the reward r. Furthermore, upon releasing a weapon, the bomber is alerted as to whether or not the deployedweaponwas successful. In other words, we employ a shoot–look–shoot policy. If the target is destroyed, the bombermoves on to the next target. On the other hand, if the target was not destroyed, the bomber can either reengage the current target or move on to the next target in the sequence. The optimal closed-loop control policy that results in the maximal expected cumulative reward is obtained via stochastic dynamic programming. Not surprisingly, a weapon is dropped on a target if, and only if, the observed reward is no less than a stageand state-dependent threshold value.We show that the threshold value, as a function, ismonotonic decreasing in the number ofweapons andmonotonic nondecreasing in the number of targets left to be engaged.","PeriodicalId":179117,"journal":{"name":"J. Aerosp. Inf. Syst.","volume":"20 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Monotone Optimal Threshold Feedback Policy for Sequential Weapon Target Assignment\",\"authors\":\"K. Krishnamoorthy, D. Casbeer, M. Pachter\",\"doi\":\"10.2514/1.I010501\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"T HEoperational scenario is the following.A bomberwith identical weapons travels along a designated route/path and sequentially encounters enemy (ground) targets. Should the bomber decide to engage a target, the target will be destroyed with a probability of p < 1. Upon successful elimination, the bomber receives a positive reward r drawn from a fixed known distribution.We stipulate that, before engagement, the bomber observes the target and is made aware of the reward r. Furthermore, upon releasing a weapon, the bomber is alerted as to whether or not the deployedweaponwas successful. In other words, we employ a shoot–look–shoot policy. If the target is destroyed, the bombermoves on to the next target. On the other hand, if the target was not destroyed, the bomber can either reengage the current target or move on to the next target in the sequence. The optimal closed-loop control policy that results in the maximal expected cumulative reward is obtained via stochastic dynamic programming. Not surprisingly, a weapon is dropped on a target if, and only if, the observed reward is no less than a stageand state-dependent threshold value.We show that the threshold value, as a function, ismonotonic decreasing in the number ofweapons andmonotonic nondecreasing in the number of targets left to be engaged.\",\"PeriodicalId\":179117,\"journal\":{\"name\":\"J. Aerosp. Inf. Syst.\",\"volume\":\"20 6\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Aerosp. Inf. Syst.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2514/1.I010501\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Aerosp. Inf. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2514/1.I010501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

操作场景如下。一架携带相同武器的轰炸机沿着指定的路线/路径飞行，并依次遇到敌人(地面)目标。如果轰炸机决定与目标交战，目标将以p < 1的概率被摧毁。成功消灭后，投弹者从已知的固定分布中获得正奖励r。我们规定，在交战前，投弹者观察目标并知道奖励r。此外，在释放武器时，投弹者被告知部署的武器是否成功。换句话说，我们采取的是“以牙还牙”的政策。如果目标被摧毁，轰炸机将转向下一个目标。另一方面，如果目标没有被摧毁，轰炸机可以重新与当前目标交战，或者继续向序列中的下一个目标移动。通过随机动态规划，得到了使期望累计奖励最大的最优闭环控制策略。毫不奇怪，当且仅当观察到的奖励不小于阶段和状态相关的阈值时，武器才会投放到目标上。我们证明了阈值作为一个函数，在武器数量上是单调减少的，而在剩余目标数量上是单调不减少的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Monotone Optimal Threshold Feedback Policy for Sequential Weapon Target Assignment

T HEoperational scenario is the following.A bomberwith identical weapons travels along a designated route/path and sequentially encounters enemy (ground) targets. Should the bomber decide to engage a target, the target will be destroyed with a probability of p < 1. Upon successful elimination, the bomber receives a positive reward r drawn from a fixed known distribution.We stipulate that, before engagement, the bomber observes the target and is made aware of the reward r. Furthermore, upon releasing a weapon, the bomber is alerted as to whether or not the deployedweaponwas successful. In other words, we employ a shoot–look–shoot policy. If the target is destroyed, the bombermoves on to the next target. On the other hand, if the target was not destroyed, the bomber can either reengage the current target or move on to the next target in the sequence. The optimal closed-loop control policy that results in the maximal expected cumulative reward is obtained via stochastic dynamic programming. Not surprisingly, a weapon is dropped on a target if, and only if, the observed reward is no less than a stageand state-dependent threshold value.We show that the threshold value, as a function, ismonotonic decreasing in the number ofweapons andmonotonic nondecreasing in the number of targets left to be engaged.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

J. Aerosp. Inf. Syst.

自引率

0.00%

发文量