Bridging Heuristic and Deep Learning Approaches to Sensor Tasking

2021 IEEE 24th International Conference on Information Fusion (FUSION) Pub Date : 2021-11-01 DOI:10.23919/fusion49465.2021.9627020

Ashton Harvey, Kathryn B. Laskey, Kuo-Chu Chang

{"title":"Bridging Heuristic and Deep Learning Approaches to Sensor Tasking","authors":"Ashton Harvey, Kathryn B. Laskey, Kuo-Chu Chang","doi":"10.23919/fusion49465.2021.9627020","DOIUrl":null,"url":null,"abstract":"Space is becoming a more crowded and contested domain, but the techniques used to task the sensors monitoring this environment have not significantly changed since the implementation of James Miller’s marginal analysis technique used in the Special Perturbations (SP) Tasker in 2007. Centralized tasker / scheduler approaches have used a Markov Decision Process (MDP) formulation, but myopic solutions fail to account for future states and non-myopic solutions tend to be computationally infeasible at scale. Linares and Furfaro proposed solving an MDP formulation of the Sensor Allocation Problem (SAP) using Deep Reinforcement Learning (DRL). DRL has been instrumental in solving many high-dimensional control problems previously considered too complex to solve at an expert level, including Go, Atari 2600, Dota 2, Starcraft 2 and autonomous driving. Linares and Furfaro showed DRL could converge on effective policies for sets of up to 300 objects in the same orbital plane. Jones expanded on that work to a full three-dimensional case with objects in diverse orbits. DRL methods can require significant training time to learn from an a priori state. This paper builds on past work by applying imitation learning to bootstrap DRL methods with existing heuristic solutions. We show that a Demonstration Guided DRL (DG-DRL) approach can effectively replicate a near-optimal tasker’s performance using trajectories from a sub-optimal heuristic. Further, we show that our approach avoids the poor initial performance typical of online DRL approaches. Code is available as an open source library at: https://github.com/AshHarvey/ssa-gym","PeriodicalId":226850,"journal":{"name":"2021 IEEE 24th International Conference on Information Fusion (FUSION)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 24th International Conference on Information Fusion (FUSION)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/fusion49465.2021.9627020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Space is becoming a more crowded and contested domain, but the techniques used to task the sensors monitoring this environment have not significantly changed since the implementation of James Miller’s marginal analysis technique used in the Special Perturbations (SP) Tasker in 2007. Centralized tasker / scheduler approaches have used a Markov Decision Process (MDP) formulation, but myopic solutions fail to account for future states and non-myopic solutions tend to be computationally infeasible at scale. Linares and Furfaro proposed solving an MDP formulation of the Sensor Allocation Problem (SAP) using Deep Reinforcement Learning (DRL). DRL has been instrumental in solving many high-dimensional control problems previously considered too complex to solve at an expert level, including Go, Atari 2600, Dota 2, Starcraft 2 and autonomous driving. Linares and Furfaro showed DRL could converge on effective policies for sets of up to 300 objects in the same orbital plane. Jones expanded on that work to a full three-dimensional case with objects in diverse orbits. DRL methods can require significant training time to learn from an a priori state. This paper builds on past work by applying imitation learning to bootstrap DRL methods with existing heuristic solutions. We show that a Demonstration Guided DRL (DG-DRL) approach can effectively replicate a near-optimal tasker’s performance using trajectories from a sub-optimal heuristic. Further, we show that our approach avoids the poor initial performance typical of online DRL approaches. Code is available as an open source library at: https://github.com/AshHarvey/ssa-gym

查看原文本刊更多论文

传感器任务处理的桥接启发式和深度学习方法

太空正成为一个越来越拥挤和有争议的领域，但自2007年詹姆斯·米勒在特殊扰动(SP)任务中使用的边际分析技术实施以来，用于监测这一环境的传感器的技术并没有显著改变。集中式任务/调度器方法使用了马尔可夫决策过程(MDP)公式，但是短视的解决方案无法考虑未来的状态，而非短视的解决方案往往在计算上不可行。Linares和Furfaro提出了利用深度强化学习(DRL)求解传感器分配问题(SAP)的MDP公式。DRL在解决许多以前被认为过于复杂而无法在专家水平上解决的高维控制问题方面发挥了重要作用，包括围棋、雅达利2600、Dota 2、星际争霸2和自动驾驶。Linares和Furfaro表示，DRL可以收敛于同一轨道平面上多达300个物体的有效策略。琼斯将这项工作扩展到一个完整的三维情况，其中包括不同轨道上的物体。DRL方法需要大量的训练时间才能从先验状态中学习。本文以过去的工作为基础，将模仿学习应用于现有启发式解决方案的引导DRL方法。我们证明了演示引导DRL (DG-DRL)方法可以使用次优启发式的轨迹有效地复制接近最优的任务者的性能。此外，我们表明，我们的方法避免了在线DRL方法的典型初始性能差。代码作为开源库可在:https://github.com/AshHarvey/ssa-gym获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 24th International Conference on Information Fusion (FUSION)

自引率

0.00%

发文量