MDPFuzz: testing models solving Markov decision processes

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis Pub Date : 2021-12-06 DOI:10.1145/3533767.3534388

Qi Pang, Yuanyuan Yuan, Shuai Wang

{"title":"MDPFuzz: testing models solving Markov decision processes","authors":"Qi Pang, Yuanyuan Yuan, Shuai Wang","doi":"10.1145/3533767.3534388","DOIUrl":null,"url":null,"abstract":"The Markov decision process (MDP) provides a mathematical frame- work for modeling sequential decision-making problems, many of which are crucial to security and safety, such as autonomous driving and robot control. The rapid development of artificial intelligence research has created efficient methods for solving MDPs, such as deep neural networks (DNNs), reinforcement learning (RL), and imitation learning (IL). However, these popular models solving MDPs are neither thoroughly tested nor rigorously reliable. We present MDPFuzz, the first blackbox fuzz testing framework for models solving MDPs. MDPFuzz forms testing oracles by checking whether the target model enters abnormal and dangerous states. During fuzzing, MDPFuzz decides which mutated state to retain by measuring if it can reduce cumulative rewards or form a new state sequence. We design efficient techniques to quantify the “freshness” of a state sequence using Gaussian mixture models (GMMs) and dynamic expectation-maximization (DynEM). We also prioritize states with high potential of revealing crashes by estimating the local sensitivity of target models over states. MDPFuzz is evaluated on five state-of-the-art models for solving MDPs, including supervised DNN, RL, IL, and multi-agent RL. Our evaluation includes scenarios of autonomous driving, aircraft collision avoidance, and two games that are often used to benchmark RL. During a 12-hour run, we find over 80 crash-triggering state sequences on each model. We show inspiring findings that crash-triggering states, though they look normal, induce distinct neuron activation patterns compared with normal states. We further develop an abnormal behavior detector to harden all the evaluated models and repair them with the findings of MDPFuzz to significantly enhance their robustness without sacrificing accuracy.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3533767.3534388","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

The Markov decision process (MDP) provides a mathematical frame- work for modeling sequential decision-making problems, many of which are crucial to security and safety, such as autonomous driving and robot control. The rapid development of artificial intelligence research has created efficient methods for solving MDPs, such as deep neural networks (DNNs), reinforcement learning (RL), and imitation learning (IL). However, these popular models solving MDPs are neither thoroughly tested nor rigorously reliable. We present MDPFuzz, the first blackbox fuzz testing framework for models solving MDPs. MDPFuzz forms testing oracles by checking whether the target model enters abnormal and dangerous states. During fuzzing, MDPFuzz decides which mutated state to retain by measuring if it can reduce cumulative rewards or form a new state sequence. We design efficient techniques to quantify the “freshness” of a state sequence using Gaussian mixture models (GMMs) and dynamic expectation-maximization (DynEM). We also prioritize states with high potential of revealing crashes by estimating the local sensitivity of target models over states. MDPFuzz is evaluated on five state-of-the-art models for solving MDPs, including supervised DNN, RL, IL, and multi-agent RL. Our evaluation includes scenarios of autonomous driving, aircraft collision avoidance, and two games that are often used to benchmark RL. During a 12-hour run, we find over 80 crash-triggering state sequences on each model. We show inspiring findings that crash-triggering states, though they look normal, induce distinct neuron activation patterns compared with normal states. We further develop an abnormal behavior detector to harden all the evaluated models and repair them with the findings of MDPFuzz to significantly enhance their robustness without sacrificing accuracy.

查看原文本刊更多论文

MDPFuzz:测试模型解决马尔可夫决策过程

马尔可夫决策过程(MDP)为序列决策问题的建模提供了一个数学框架，其中许多问题对安全和安全至关重要，例如自动驾驶和机器人控制。人工智能研究的快速发展创造了解决mdp的有效方法，如深度神经网络(dnn)、强化学习(RL)和模仿学习(IL)。然而，这些解决mdp的流行模型既没有经过彻底的测试，也没有严格的可靠性。我们提出了MDPFuzz，这是第一个用于求解mdp模型的黑盒模糊测试框架。MDPFuzz通过检查目标模型是否进入异常和危险状态来形成测试预言。在模糊过程中，MDPFuzz通过测量是否可以减少累积奖励或形成新的状态序列来决定保留哪个突变状态。我们设计了有效的技术来量化状态序列的“新鲜度”，使用高斯混合模型(GMMs)和动态期望最大化(DynEM)。我们还通过估计目标模型对状态的局部敏感性来确定具有高揭示崩溃潜力的状态的优先级。MDPFuzz在五个最先进的模型上进行评估，用于解决mdp，包括监督DNN, RL, IL和多智能体RL。我们的评估包括自动驾驶、飞机避碰和两个经常用于测试强化学习的游戏。在12小时的运行中，我们在每个模型上发现了超过80个触发崩溃的状态序列。我们展示了令人鼓舞的发现，崩溃触发状态，尽管看起来正常，但与正常状态相比，会诱发不同的神经元激活模式。我们进一步开发了一个异常行为检测器来强化所有评估的模型，并利用MDPFuzz的发现对它们进行修复，从而在不牺牲准确性的情况下显着提高它们的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

自引率

0.00%

发文量