TAGAPT: Toward Automatic Generation of APT Samples With Provenance-Level Granularity

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Information Forensics and Security Pub Date : 2025-04-03 DOI:10.1109/TIFS.2025.3557742

Wenrui Cheng;Qixuan Yuan;Tiantian Zhu;Tieming Chen;Jie Ying;Aohan Zheng;Mingjun Ma;Chunlin Xiong;Mingqi Lv;Yan Chen

{"title":"TAGAPT: Toward Automatic Generation of APT Samples With Provenance-Level Granularity","authors":"Wenrui Cheng;Qixuan Yuan;Tiantian Zhu;Tieming Chen;Jie Ying;Aohan Zheng;Mingjun Ma;Chunlin Xiong;Mingqi Lv;Yan Chen","doi":"10.1109/TIFS.2025.3557742","DOIUrl":null,"url":null,"abstract":"Detecting advanced persistent threats (APTs) at a host via data provenance has emerged as a valuable yet challenging task. Compared with attack rule matching, machine learning approaches offer new perspectives for efficiently detecting attacks by leveraging their inherent ability to autonomously learn from data and adapt to dynamic environments. However, the scarcity of APT samples poses a significant limitation, rendering supervised learning methods that have demonstrated remarkable capabilities in other domains (e.g., malware detection) impractical. Therefore, we propose a system called TAGAPT, which is able to automatically generate numerous APT samples with provenance-level granularity. First, we introduce a deep graph generation model to generalize various graph structures that represent new attack patterns. Second, we propose an attack stage division algorithm to divide each generated graph structure into stage subgraphs. Finally, we design a genetic algorithm to find the optimal attack technique explanation for each subgraph and obtain fully instantiated APT samples. Experimental results demonstrate that TAGAPT can learn from existing attack patterns and generalize to novel attack patterns. Furthermore, the generated APT samples 1) exhibit the ability to help with efficient threat hunting and 2) provide additional assistance to the state-of-the-art (SOTA) attack detection system (Kairos) by filtering out 73% of the observed false positives. We have open-sourced the code and the generated samples to support the development of the security community.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"4137-4151"},"PeriodicalIF":6.3000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10948500/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Detecting advanced persistent threats (APTs) at a host via data provenance has emerged as a valuable yet challenging task. Compared with attack rule matching, machine learning approaches offer new perspectives for efficiently detecting attacks by leveraging their inherent ability to autonomously learn from data and adapt to dynamic environments. However, the scarcity of APT samples poses a significant limitation, rendering supervised learning methods that have demonstrated remarkable capabilities in other domains (e.g., malware detection) impractical. Therefore, we propose a system called TAGAPT, which is able to automatically generate numerous APT samples with provenance-level granularity. First, we introduce a deep graph generation model to generalize various graph structures that represent new attack patterns. Second, we propose an attack stage division algorithm to divide each generated graph structure into stage subgraphs. Finally, we design a genetic algorithm to find the optimal attack technique explanation for each subgraph and obtain fully instantiated APT samples. Experimental results demonstrate that TAGAPT can learn from existing attack patterns and generalize to novel attack patterns. Furthermore, the generated APT samples 1) exhibit the ability to help with efficient threat hunting and 2) provide additional assistance to the state-of-the-art (SOTA) attack detection system (Kairos) by filtering out 73% of the observed false positives. We have open-sourced the code and the generated samples to support the development of the security community.

查看原文本刊更多论文

TAGAPT：自动生成具有来源级粒度的APT样本

通过数据来源检测主机上的高级持续性威胁（apt）已成为一项有价值但具有挑战性的任务。与攻击规则匹配相比，机器学习方法利用其固有的从数据中自主学习和适应动态环境的能力，为有效检测攻击提供了新的视角。然而，APT样本的稀缺性构成了一个重大的限制，使得在其他领域（例如恶意软件检测）表现出卓越能力的监督学习方法变得不切实际。因此，我们提出了一个名为TAGAPT的系统，该系统能够自动生成大量具有来源级粒度的APT样本。首先，我们引入了一个深度图生成模型来概括代表新攻击模式的各种图结构。其次，提出了一种攻击阶段划分算法，将生成的图结构划分为阶段子图。最后，我们设计了一种遗传算法来寻找每个子图的最优攻击技术解释，并获得完全实例化的APT样本。实验结果表明，TAGAPT可以从已有的攻击模式中学习并推广到新的攻击模式。此外，生成的APT样本1)显示出帮助有效搜索威胁的能力，2)通过过滤掉73%的观察到的假阳性，为最先进的（SOTA）攻击检测系统（Kairos）提供额外的帮助。我们已经开源了代码和生成的示例，以支持安全社区的发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Information Forensics and Security 工程技术-工程：电子与电气

CiteScore

14.40

自引率

7.40%

发文量

234

审稿时长

6.5 months

期刊介绍： The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features