Enhancing military medical evacuation dispatching with armed escort management: comparing model-based reinforcement learning approaches

The Journal of Defense Modeling and Simulation: Applications, Methodology, Technology Pub Date : 2024-04-10 DOI:10.1177/15485129241229762

Andrew G Gelbard, Phillip R. Jenkins, Matthew J. Robbins

{"title":"Enhancing military medical evacuation dispatching with armed escort management: comparing model-based reinforcement learning approaches","authors":"Andrew G Gelbard, Phillip R. Jenkins, Matthew J. Robbins","doi":"10.1177/15485129241229762","DOIUrl":null,"url":null,"abstract":"The military medical evacuation (MEDEVAC) dispatching problem involves determining optimal policies for evacuating combat casualties to maximize patient survivability during military operations. This study explores a variation of the MEDEVAC dispatching problem, focusing on controlling armed escorts using a Markov decision process (MDP) model and model-based reinforcement learning (RL) approaches. A discounted, continuous-time MDP model over an infinite horizon is developed to maximize the expected total discounted reward of the system. Two model-based RL solution approaches are proposed: one utilizing semi-gradient descent Q-learning and another employing semi-gradient descent SARSA. A computational example, set in western and central Africa during contingency operations, assesses the performance of the RL-generated policies against the myopic policy, which military medical planners currently employ. Solution quality is derived from expected response time, a crucial determinant of life-saving potential in MEDEVAC operations. The research also explores sensitivity analysis and excursion scenarios to evaluate the RL-generated policies further. By explicitly controlling armed escort assets, dispatching authorities can better manage the location and allocation of these resources throughout combat operations. The findings of this study have the potential to inform military medical planning, operations, and tactics, ultimately leading to improved MEDEVAC system performance and higher patient survivability rates.","PeriodicalId":508000,"journal":{"name":"The Journal of Defense Modeling and Simulation: Applications, Methodology, Technology","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Defense Modeling and Simulation: Applications, Methodology, Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/15485129241229762","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The military medical evacuation (MEDEVAC) dispatching problem involves determining optimal policies for evacuating combat casualties to maximize patient survivability during military operations. This study explores a variation of the MEDEVAC dispatching problem, focusing on controlling armed escorts using a Markov decision process (MDP) model and model-based reinforcement learning (RL) approaches. A discounted, continuous-time MDP model over an infinite horizon is developed to maximize the expected total discounted reward of the system. Two model-based RL solution approaches are proposed: one utilizing semi-gradient descent Q-learning and another employing semi-gradient descent SARSA. A computational example, set in western and central Africa during contingency operations, assesses the performance of the RL-generated policies against the myopic policy, which military medical planners currently employ. Solution quality is derived from expected response time, a crucial determinant of life-saving potential in MEDEVAC operations. The research also explores sensitivity analysis and excursion scenarios to evaluate the RL-generated policies further. By explicitly controlling armed escort assets, dispatching authorities can better manage the location and allocation of these resources throughout combat operations. The findings of this study have the potential to inform military medical planning, operations, and tactics, ultimately leading to improved MEDEVAC system performance and higher patient survivability rates.

查看原文本刊更多论文

利用武装护送管理加强军事医疗后送调度：比较基于模型的强化学习方法

军事医疗后送（MEDEVAC）调度问题涉及在军事行动中确定后送作战伤员的最佳策略，以最大限度地提高病人的存活率。本研究探讨了 MEDEVAC 调度问题的一个变种，重点是使用马尔可夫决策过程 (MDP) 模型和基于模型的强化学习 (RL) 方法控制武装护送。为了使系统的预期总贴现回报最大化，我们建立了一个无限视距的连续时间贴现 MDP 模型。提出了两种基于模型的 RL 解决方法：一种是利用半梯度下降 Q-learning 方法，另一种是利用半梯度下降 SARSA 方法。在应急行动期间，以非洲西部和中部为背景的一个计算实例评估了 RL 生成的策略与军事医疗规划人员目前采用的近视策略的性能对比。解决方案的质量来自预期响应时间，这是 MEDEVAC 行动中挽救生命潜力的关键决定因素。研究还探讨了敏感性分析和偏离情景，以进一步评估 RL 生成的策略。通过明确控制武装护送资产，调度当局可以在整个作战行动中更好地管理这些资源的位置和分配。本研究的结果有可能为军事医疗规划、行动和战术提供参考，最终提高医疗后送系统的性能和病人存活率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Journal of Defense Modeling and Simulation: Applications, Methodology, Technology

自引率

0.00%

发文量