Evaluating Adaptive Deception Strategies for Cyber Defense with Human Adversaries

Game Theory and Machine Learning for Cyber Security Pub Date : 2021-09-12 DOI:10.1002/9781119723950.ch5

Palvi Aggarwal, Marcus Gutierrez, Chris Kiekintveld, B. Bosanský, Cleotilde González

{"title":"Evaluating Adaptive Deception Strategies for Cyber Defense with Human Adversaries","authors":"Palvi Aggarwal, Marcus Gutierrez, Chris Kiekintveld, B. Bosanský, Cleotilde González","doi":"10.1002/9781119723950.ch5","DOIUrl":null,"url":null,"abstract":"We investigate the effectiveness of various algorithms for defensive cyber‐deception in an adversarial decision‐making task using human experiments. Our combinatorial Multi‐Armed Bandit task represents an abstract version of a realistic problem in cybersecurity: allocating limited resources for defense in a way that an adversary can be most successfully deceived to attack “fake” nodes (i.e., honeypots) instead of the real ones. We propose six algorithms with different degrees of determinism, adaptivity, and customization to the human adversary's actions. We test these algorithms in six separate behavioral studies, where humans are paired against each of the six types of defense. We measure the effectiveness of the algorithms according to how humans learn the defense strategies, which is a reflection of the success of the algorithms in deceiving human adversaries. We find that the adaptivity of the strategy is more important than the expected optimality of the algorithm. Humans learned and took advantage of defense algorithms that are deterministic, nonadaptive, and not customized. At the same time, not all algorithms that were nondeterministic, adaptive, and customized, were effective. The Learning with Linear Rewards (LLR) algorithm, one that was purely adaptive, was the most successful; suggesting that adaptivity is an important feature of defense algorithms. New ways to customize the defense strategies to the adversary's behavior are needed.","PeriodicalId":332247,"journal":{"name":"Game Theory and Machine Learning for Cyber Security","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Game Theory and Machine Learning for Cyber Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/9781119723950.ch5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

We investigate the effectiveness of various algorithms for defensive cyber‐deception in an adversarial decision‐making task using human experiments. Our combinatorial Multi‐Armed Bandit task represents an abstract version of a realistic problem in cybersecurity: allocating limited resources for defense in a way that an adversary can be most successfully deceived to attack “fake” nodes (i.e., honeypots) instead of the real ones. We propose six algorithms with different degrees of determinism, adaptivity, and customization to the human adversary's actions. We test these algorithms in six separate behavioral studies, where humans are paired against each of the six types of defense. We measure the effectiveness of the algorithms according to how humans learn the defense strategies, which is a reflection of the success of the algorithms in deceiving human adversaries. We find that the adaptivity of the strategy is more important than the expected optimality of the algorithm. Humans learned and took advantage of defense algorithms that are deterministic, nonadaptive, and not customized. At the same time, not all algorithms that were nondeterministic, adaptive, and customized, were effective. The Learning with Linear Rewards (LLR) algorithm, one that was purely adaptive, was the most successful; suggesting that adaptivity is an important feature of defense algorithms. New ways to customize the defense strategies to the adversary's behavior are needed.

查看原文本刊更多论文

评估人类对手网络防御的自适应欺骗策略

我们利用人类实验研究了在对抗决策任务中防御网络欺骗的各种算法的有效性。我们的组合Multi - Armed Bandit任务代表了网络安全中一个现实问题的抽象版本:以一种可以最成功地欺骗对手攻击“假”节点(即蜜罐)而不是真实节点的方式分配有限的防御资源。我们提出了六种算法，它们对人类对手的行为具有不同程度的确定性、适应性和定制性。我们在六个独立的行为研究中测试了这些算法，在这些研究中，人类被配对对抗六种防御类型中的每一种。我们根据人类如何学习防御策略来衡量算法的有效性，这反映了算法在欺骗人类对手方面的成功。我们发现策略的自适应性比算法的预期最优性更重要。人类学习并利用了确定性、非自适应和非自定义的防御算法。同时，并非所有不确定、自适应和自定义的算法都是有效的。线性奖励学习算法(LLR)是一种完全自适应的算法，是最成功的;表明自适应是防御算法的重要特征。需要根据对手的行为定制防御策略的新方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Game Theory and Machine Learning for Cyber Security

自引率

0.00%

发文量