Exploring online and offline explainability in deep reinforcement learning for aircraft separation assurance

Frontiers in Aerospace Engineering Pub Date : 2022-12-13 DOI:10.3389/fpace.2022.1071793

Wei Guo, Yi Zhou, Peng Wei

{"title":"Exploring online and offline explainability in deep reinforcement learning for aircraft separation assurance","authors":"Wei Guo, Yi Zhou, Peng Wei","doi":"10.3389/fpace.2022.1071793","DOIUrl":null,"url":null,"abstract":"Deep Reinforcement Learning (DRL) has demonstrated promising performance in maintaining safe separation among aircraft. In this work, we focus on a specific engineering application of aircraft separation assurance in structured airspace with high-density air traffic. In spite of the scalable performance, the non-transparent decision-making processes of DRL hinders human users from building trust in such learning-based decision making tool. In order to build a trustworthy DRL-based aircraft separation assurance system, we propose a novel framework to provide stepwise explanations of DRL policies for human users. Based on the different needs of human users, our framework integrates 1) a Soft Decision Tree (SDT) as an online explanation provider to display critical information for human operators in real-time; and 2) a saliency method, Linearly Estimated Gradient (LEG), as an offline explanation tool for certification agencies to conduct more comprehensive verification time or post-event analyses. Corresponding visualization methods are proposed to illustrate the information in the SDT and LEG efficiently: 1) Online explanations are visualized with tree plots and trajectory plots; 2) Offline explanations are visualized with saliency maps and position maps. In the BlueSky air traffic simulator, we evaluate the effectiveness of our framework on case studies with complex airspace route structures. Results show that the proposed framework can provide reasonable explanations of multi-agent sequential decision-making. In addition, for more predictable and trustworthy DRL models, we investigate two specific patterns that DRL policies follow based on similar aircraft locations in the airspace.","PeriodicalId":365813,"journal":{"name":"Frontiers in Aerospace Engineering","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Aerospace Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fpace.2022.1071793","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep Reinforcement Learning (DRL) has demonstrated promising performance in maintaining safe separation among aircraft. In this work, we focus on a specific engineering application of aircraft separation assurance in structured airspace with high-density air traffic. In spite of the scalable performance, the non-transparent decision-making processes of DRL hinders human users from building trust in such learning-based decision making tool. In order to build a trustworthy DRL-based aircraft separation assurance system, we propose a novel framework to provide stepwise explanations of DRL policies for human users. Based on the different needs of human users, our framework integrates 1) a Soft Decision Tree (SDT) as an online explanation provider to display critical information for human operators in real-time; and 2) a saliency method, Linearly Estimated Gradient (LEG), as an offline explanation tool for certification agencies to conduct more comprehensive verification time or post-event analyses. Corresponding visualization methods are proposed to illustrate the information in the SDT and LEG efficiently: 1) Online explanations are visualized with tree plots and trajectory plots; 2) Offline explanations are visualized with saliency maps and position maps. In the BlueSky air traffic simulator, we evaluate the effectiveness of our framework on case studies with complex airspace route structures. Results show that the proposed framework can provide reasonable explanations of multi-agent sequential decision-making. In addition, for more predictable and trustworthy DRL models, we investigate two specific patterns that DRL policies follow based on similar aircraft locations in the airspace.

查看原文本刊更多论文

探索飞机分离保障中深度强化学习的在线和离线可解释性

深度强化学习(Deep Reinforcement Learning, DRL)在保持飞机间安全分离方面表现出了良好的性能。在这项工作中，我们重点研究了高密度空中交通的结构化空域中飞机分离保证的具体工程应用。尽管具有可扩展的性能，但DRL的不透明决策过程阻碍了人类用户对这种基于学习的决策工具建立信任。为了构建一个可信赖的基于DRL的飞机分离保障系统，我们提出了一个新的框架，为人类用户提供DRL策略的逐步解释。根据人类用户的不同需求，我们的框架集成了1)软决策树(SDT)作为在线解释提供者，为人类操作员实时显示关键信息;2)显著性方法线性估计梯度(LEG)，作为认证机构进行更全面的验证时间或事后分析的离线解释工具。为了有效地说明SDT和LEG中的信息，提出了相应的可视化方法:1)利用树形图和轨迹图对在线解释进行可视化;2)通过显著性图和位置图将离线解释可视化。在蓝天空中交通模拟器中，我们评估了我们的框架在复杂空域路线结构的案例研究中的有效性。结果表明，该框架能够对多智能体顺序决策问题提供合理的解释。此外，为了获得更可预测和可信的DRL模型，我们研究了基于空域中类似飞机位置的DRL策略遵循的两种特定模式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers in Aerospace Engineering

自引率

0.00%

发文量