Adversarial Data Mining: Big Data Meets Cyber Security

Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security Pub Date : 2016-10-24 DOI:10.1145/2976749.2976753

Murat Kantarcioglu, B. Xi

{"title":"Adversarial Data Mining: Big Data Meets Cyber Security","authors":"Murat Kantarcioglu, B. Xi","doi":"10.1145/2976749.2976753","DOIUrl":null,"url":null,"abstract":"As more and more cyber security incident data ranging from systems logs to vulnerability scan results are collected, manually analyzing these collected data to detect important cyber security events become impossible. Hence, data mining techniques are becoming an essential tool for real-world cyber security applications. For example, a report from Gartner [gartner12] claims that \"Information security is becoming a big data analytics problem, where massive amounts of data will be correlated, analyzed and mined for meaningful patterns\". Of course, data mining/analytics is a means to an end where the ultimate goal is to provide cyber security analysts with prioritized actionable insights derived from big data. This raises the question, can we directly apply existing techniques to cyber security applications? One of the most important differences between data mining for cyber security and many other data mining applications is the existence of malicious adversaries that continuously adapt their behavior to hide their actions and to make the data mining models ineffective. Unfortunately, traditional data mining techniques are insufficient to handle such adversarial problems directly. The adversaries adapt to the data miner's reactions, and data mining algorithms constructed based on a training dataset degrades quickly. To address these concerns, over the last couple of years new and novel data mining techniques which is more resilient to such adversarial behavior are being developed in machine learning and data mining community. We believe that lessons learned as a part of this research direction would be beneficial for cyber security researchers who are increasingly applying machine learning and data mining techniques in practice. To give an overview of recent developments in adversarial data mining, in this three hour long tutorial, we introduce the foundations, the techniques, and the applications of adversarial data mining to cyber security applications. We first introduce various approaches proposed in the past to defend against active adversaries, such as a minimax approach to minimize the worst case error through a zero-sum game. We then discuss a game theoretic framework to model the sequential actions of the adversary and the data miner, while both parties try to maximize their utilities. We also introduce a modified support vector machine method and a relevance vector machine method to defend against active adversaries. Intrusion detection and malware detection are two important application areas for adversarial data mining models that will be discussed in details during the tutorial. Finally, we discuss some practical guidelines on how to use adversarial data mining ideas in generic cyber security applications and how to leverage existing big data management tools for building data mining algorithms for cyber security.","PeriodicalId":432261,"journal":{"name":"Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2976749.2976753","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

As more and more cyber security incident data ranging from systems logs to vulnerability scan results are collected, manually analyzing these collected data to detect important cyber security events become impossible. Hence, data mining techniques are becoming an essential tool for real-world cyber security applications. For example, a report from Gartner [gartner12] claims that "Information security is becoming a big data analytics problem, where massive amounts of data will be correlated, analyzed and mined for meaningful patterns". Of course, data mining/analytics is a means to an end where the ultimate goal is to provide cyber security analysts with prioritized actionable insights derived from big data. This raises the question, can we directly apply existing techniques to cyber security applications? One of the most important differences between data mining for cyber security and many other data mining applications is the existence of malicious adversaries that continuously adapt their behavior to hide their actions and to make the data mining models ineffective. Unfortunately, traditional data mining techniques are insufficient to handle such adversarial problems directly. The adversaries adapt to the data miner's reactions, and data mining algorithms constructed based on a training dataset degrades quickly. To address these concerns, over the last couple of years new and novel data mining techniques which is more resilient to such adversarial behavior are being developed in machine learning and data mining community. We believe that lessons learned as a part of this research direction would be beneficial for cyber security researchers who are increasingly applying machine learning and data mining techniques in practice. To give an overview of recent developments in adversarial data mining, in this three hour long tutorial, we introduce the foundations, the techniques, and the applications of adversarial data mining to cyber security applications. We first introduce various approaches proposed in the past to defend against active adversaries, such as a minimax approach to minimize the worst case error through a zero-sum game. We then discuss a game theoretic framework to model the sequential actions of the adversary and the data miner, while both parties try to maximize their utilities. We also introduce a modified support vector machine method and a relevance vector machine method to defend against active adversaries. Intrusion detection and malware detection are two important application areas for adversarial data mining models that will be discussed in details during the tutorial. Finally, we discuss some practical guidelines on how to use adversarial data mining ideas in generic cyber security applications and how to leverage existing big data management tools for building data mining algorithms for cyber security.

查看原文本刊更多论文

对抗性数据挖掘:大数据遇上网络安全

随着越来越多的网络安全事件数据被收集，从系统日志到漏洞扫描结果，手工分析这些收集的数据来检测重要的网络安全事件已经不可能了。因此，数据挖掘技术正在成为现实世界网络安全应用的重要工具。例如，高德纳(Gartner)的一份报告声称:“信息安全正在成为一个大数据分析问题，海量数据将被关联、分析并挖掘出有意义的模式。”当然，数据挖掘/分析是实现最终目标的一种手段，其最终目标是为网络安全分析师提供来自大数据的优先级可操作见解。这就提出了一个问题，我们能否直接将现有技术应用于网络安全?网络安全数据挖掘与许多其他数据挖掘应用之间最重要的区别之一是恶意对手的存在，恶意对手不断调整其行为以隐藏其行为并使数据挖掘模型无效。不幸的是，传统的数据挖掘技术不足以直接处理这种对抗性问题。对手会适应数据挖掘者的反应，而基于训练数据集构建的数据挖掘算法会迅速退化。为了解决这些问题，在过去的几年里，机器学习和数据挖掘社区正在开发新的和新颖的数据挖掘技术，这些技术对这种对抗行为更具弹性。我们相信，作为这一研究方向的一部分，所吸取的经验教训将对在实践中越来越多地应用机器学习和数据挖掘技术的网络安全研究人员有益。为了概述对抗性数据挖掘的最新发展，在这个长达三个小时的教程中，我们介绍了对抗性数据挖掘在网络安全应用中的基础、技术和应用。我们首先介绍过去提出的各种防御主动对手的方法，例如通过零和博弈最小化最坏情况错误的极大极小方法。然后，我们讨论了一个博弈论框架来模拟对手和数据挖掘者的连续行动，而双方都试图最大化他们的效用。我们还引入了改进的支持向量机方法和相关向量机方法来防御主动攻击者。入侵检测和恶意软件检测是对抗性数据挖掘模型的两个重要应用领域，将在本教程中详细讨论。最后，我们讨论了一些关于如何在通用网络安全应用中使用对抗性数据挖掘思想以及如何利用现有大数据管理工具构建网络安全数据挖掘算法的实用指南。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security

自引率

0.00%

发文量