{"title":"序列武器目标分配的单调最优阈值反馈策略","authors":"K. Krishnamoorthy, D. Casbeer, M. Pachter","doi":"10.2514/1.I010501","DOIUrl":null,"url":null,"abstract":"T HEoperational scenario is the following.A bomberwith identical weapons travels along a designated route/path and sequentially encounters enemy (ground) targets. Should the bomber decide to engage a target, the target will be destroyed with a probability of p < 1. Upon successful elimination, the bomber receives a positive reward r drawn from a fixed known distribution.We stipulate that, before engagement, the bomber observes the target and is made aware of the reward r. Furthermore, upon releasing a weapon, the bomber is alerted as to whether or not the deployedweaponwas successful. In other words, we employ a shoot–look–shoot policy. If the target is destroyed, the bombermoves on to the next target. On the other hand, if the target was not destroyed, the bomber can either reengage the current target or move on to the next target in the sequence. The optimal closed-loop control policy that results in the maximal expected cumulative reward is obtained via stochastic dynamic programming. Not surprisingly, a weapon is dropped on a target if, and only if, the observed reward is no less than a stageand state-dependent threshold value.We show that the threshold value, as a function, ismonotonic decreasing in the number ofweapons andmonotonic nondecreasing in the number of targets left to be engaged.","PeriodicalId":179117,"journal":{"name":"J. Aerosp. Inf. Syst.","volume":"20 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Monotone Optimal Threshold Feedback Policy for Sequential Weapon Target Assignment\",\"authors\":\"K. Krishnamoorthy, D. Casbeer, M. Pachter\",\"doi\":\"10.2514/1.I010501\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"T HEoperational scenario is the following.A bomberwith identical weapons travels along a designated route/path and sequentially encounters enemy (ground) targets. Should the bomber decide to engage a target, the target will be destroyed with a probability of p < 1. Upon successful elimination, the bomber receives a positive reward r drawn from a fixed known distribution.We stipulate that, before engagement, the bomber observes the target and is made aware of the reward r. Furthermore, upon releasing a weapon, the bomber is alerted as to whether or not the deployedweaponwas successful. In other words, we employ a shoot–look–shoot policy. If the target is destroyed, the bombermoves on to the next target. On the other hand, if the target was not destroyed, the bomber can either reengage the current target or move on to the next target in the sequence. The optimal closed-loop control policy that results in the maximal expected cumulative reward is obtained via stochastic dynamic programming. Not surprisingly, a weapon is dropped on a target if, and only if, the observed reward is no less than a stageand state-dependent threshold value.We show that the threshold value, as a function, ismonotonic decreasing in the number ofweapons andmonotonic nondecreasing in the number of targets left to be engaged.\",\"PeriodicalId\":179117,\"journal\":{\"name\":\"J. Aerosp. Inf. Syst.\",\"volume\":\"20 6\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Aerosp. Inf. Syst.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2514/1.I010501\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Aerosp. Inf. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2514/1.I010501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Monotone Optimal Threshold Feedback Policy for Sequential Weapon Target Assignment
T HEoperational scenario is the following.A bomberwith identical weapons travels along a designated route/path and sequentially encounters enemy (ground) targets. Should the bomber decide to engage a target, the target will be destroyed with a probability of p < 1. Upon successful elimination, the bomber receives a positive reward r drawn from a fixed known distribution.We stipulate that, before engagement, the bomber observes the target and is made aware of the reward r. Furthermore, upon releasing a weapon, the bomber is alerted as to whether or not the deployedweaponwas successful. In other words, we employ a shoot–look–shoot policy. If the target is destroyed, the bombermoves on to the next target. On the other hand, if the target was not destroyed, the bomber can either reengage the current target or move on to the next target in the sequence. The optimal closed-loop control policy that results in the maximal expected cumulative reward is obtained via stochastic dynamic programming. Not surprisingly, a weapon is dropped on a target if, and only if, the observed reward is no less than a stageand state-dependent threshold value.We show that the threshold value, as a function, ismonotonic decreasing in the number ofweapons andmonotonic nondecreasing in the number of targets left to be engaged.