PROV-GEM: Automated Provenance Analysis Framework using Graph Embeddings

2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2021-12-01 DOI:10.1109/ICMLA52953.2021.00273

Maya Kapoor, Joshua Melton, Michael Ridenhour, S. Krishnan, Thomas Moyer

{"title":"PROV-GEM: Automated Provenance Analysis Framework using Graph Embeddings","authors":"Maya Kapoor, Joshua Melton, Michael Ridenhour, S. Krishnan, Thomas Moyer","doi":"10.1109/ICMLA52953.2021.00273","DOIUrl":null,"url":null,"abstract":"Data provenance graphs, detailed traces of system behavior, are a popular construct to analyze and forecast malicious cyber activity like advanced persistent threats (APT). A critical limitation of existing analysis techniques is the lack of an automated analytic framework to predict APTs. In this work, we address that limitation by augmenting efficient capture and storage mechanisms to include automated analysis. Specifically, we propose PROV-GEM, a deep graph learning framework to identify malicious anomalous behavior from provenance data. Since data provenance graphs are complex datasets often expressed as heterogeneous attributed multiplex networks, we use a unified relation-aware embedding framework to capture the necessary contexts and associated interactions between the various entities manifest in the data. Furthermore, provenance graphs by nature are rich detailed structures that are heavily attributed compared to other complex systems that have been used traditionally in graph machine learning applications. Towards that end, our framework uniquely captures “multi-embeddings” that can represent varied contexts of nodes and their multi-faceted nature. We demonstrate the efficacy of our embeddings by applying PROV-GEM to two publicly available APT provenance graph datasets from StreamSpot and Unicorn. PROV-GEM achieves strong performance on both datasets with a 99% accuracy and 97% F1-score on the StreamSpot dataset, and a 97% accuracy and 89% F1-score on the Unicorn dataset, equaling or outperforming comparable state-of-the-art APT threat detection models. Unlike other frameworks, PROV-GEM utilizes an efficient graph convolutional approach coupled with relational self-attention to generate rich graph embeddings that capture the complex topology of data provenance graphs, providing an effective automated analytic framework for APT detection.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"115 1","pages":"1720-1727"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA52953.2021.00273","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Data provenance graphs, detailed traces of system behavior, are a popular construct to analyze and forecast malicious cyber activity like advanced persistent threats (APT). A critical limitation of existing analysis techniques is the lack of an automated analytic framework to predict APTs. In this work, we address that limitation by augmenting efficient capture and storage mechanisms to include automated analysis. Specifically, we propose PROV-GEM, a deep graph learning framework to identify malicious anomalous behavior from provenance data. Since data provenance graphs are complex datasets often expressed as heterogeneous attributed multiplex networks, we use a unified relation-aware embedding framework to capture the necessary contexts and associated interactions between the various entities manifest in the data. Furthermore, provenance graphs by nature are rich detailed structures that are heavily attributed compared to other complex systems that have been used traditionally in graph machine learning applications. Towards that end, our framework uniquely captures “multi-embeddings” that can represent varied contexts of nodes and their multi-faceted nature. We demonstrate the efficacy of our embeddings by applying PROV-GEM to two publicly available APT provenance graph datasets from StreamSpot and Unicorn. PROV-GEM achieves strong performance on both datasets with a 99% accuracy and 97% F1-score on the StreamSpot dataset, and a 97% accuracy and 89% F1-score on the Unicorn dataset, equaling or outperforming comparable state-of-the-art APT threat detection models. Unlike other frameworks, PROV-GEM utilizes an efficient graph convolutional approach coupled with relational self-attention to generate rich graph embeddings that capture the complex topology of data provenance graphs, providing an effective automated analytic framework for APT detection.

查看原文本刊更多论文

gem:使用图嵌入的自动化来源分析框架

数据来源图是系统行为的详细痕迹，是分析和预测高级持续性威胁(APT)等恶意网络活动的常用结构。现有分析技术的一个关键限制是缺乏预测apt的自动化分析框架。在这项工作中，我们通过增加有效的捕获和存储机制来包括自动化分析来解决这一限制。具体来说，我们提出了provo - gem，这是一个深度图学习框架，用于从来源数据中识别恶意异常行为。由于数据来源图是复杂的数据集，通常表示为异构属性多路网络，因此我们使用统一的关系感知嵌入框架来捕获数据中显示的各种实体之间的必要上下文和相关交互。此外，与传统上在图机器学习应用中使用的其他复杂系统相比，来源图本质上是丰富的详细结构。为此，我们的框架独特地捕获了“多嵌入”，可以表示节点的各种上下文及其多面性。我们通过将provo - gem应用于来自StreamSpot和Unicorn的两个公开可用的APT来源图数据集来证明我们嵌入的有效性。provo - gem在两个数据集上都实现了强大的性能，在StreamSpot数据集上具有99%的准确性和97%的f1分数，在Unicorn数据集上具有97%的准确性和89%的f1分数，相当于或优于可比较的最先进的APT威胁检测模型。与其他框架不同，provo - gem利用高效的图卷积方法与关系自关注相结合，生成丰富的图嵌入，捕获数据来源图的复杂拓扑，为APT检测提供有效的自动化分析框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量