以威胁--负载为中心增强网络威胁分类模型的自我训练

IF 9.9 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

IEEE Transactions on Industrial Informatics Pub Date : 2024-06-27 DOI:10.1109/TII.2024.3413300

Jae-Yeol Kim;Hyuk-Yoon Kwon

{"title":"以威胁--负载为中心增强网络威胁分类模型的自我训练","authors":"Jae-Yeol Kim;Hyuk-Yoon Kwon","doi":"10.1109/TII.2024.3413300","DOIUrl":null,"url":null,"abstract":"Deep learning (DL)-based threat classification has been investigated for effective analysis of threat events to minimize the human's resources in security operation centers (SOC). However, human-labeling (HL) by SOC security analysts is still necessary for accurate classification and responses to the unknown threat events or new threat trends. This labeling process consumes significant time and effort, posing limitations in constructing an efficient SOC response system, especially for immediate responses to newly generated large-scale threats. To address this, we propose a new self-training method of threat classification model, PLC-TPA. We present a self-training pipeline based on pseudo-labeling with confidence (PLC) for automatic labeling of newly captured threats. To resolve the class imbalance during self-training, we present a novel threat-payload centric augmentation (TPA) method considering threat-payload characteristics. Through extensive experiments, we show that PLC-TPA achieves a high accuracy of threat classification about 0.973 to 0.988 of F1-score, which improves other self-training methods by 10.9% to 13.4%. Notably, PLC-TPA performs comparable even to HL with significantly faster response times. These findings suggest substantial improvements in DL-based SOC environments with the proposed PLC-TPA. PLC-TPA also outperforms the existing methods by 8.3% to 17.4% in comparative experiments.","PeriodicalId":13301,"journal":{"name":"IEEE Transactions on Industrial Informatics","volume":"20 10","pages":"11740-11750"},"PeriodicalIF":9.9000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Self-Training of Cyber-Threat Classification Model With Threat-Payload Centric Augmentation\",\"authors\":\"Jae-Yeol Kim;Hyuk-Yoon Kwon\",\"doi\":\"10.1109/TII.2024.3413300\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning (DL)-based threat classification has been investigated for effective analysis of threat events to minimize the human's resources in security operation centers (SOC). However, human-labeling (HL) by SOC security analysts is still necessary for accurate classification and responses to the unknown threat events or new threat trends. This labeling process consumes significant time and effort, posing limitations in constructing an efficient SOC response system, especially for immediate responses to newly generated large-scale threats. To address this, we propose a new self-training method of threat classification model, PLC-TPA. We present a self-training pipeline based on pseudo-labeling with confidence (PLC) for automatic labeling of newly captured threats. To resolve the class imbalance during self-training, we present a novel threat-payload centric augmentation (TPA) method considering threat-payload characteristics. Through extensive experiments, we show that PLC-TPA achieves a high accuracy of threat classification about 0.973 to 0.988 of F1-score, which improves other self-training methods by 10.9% to 13.4%. Notably, PLC-TPA performs comparable even to HL with significantly faster response times. These findings suggest substantial improvements in DL-based SOC environments with the proposed PLC-TPA. PLC-TPA also outperforms the existing methods by 8.3% to 17.4% in comparative experiments.\",\"PeriodicalId\":13301,\"journal\":{\"name\":\"IEEE Transactions on Industrial Informatics\",\"volume\":\"20 10\",\"pages\":\"11740-11750\"},\"PeriodicalIF\":9.9000,\"publicationDate\":\"2024-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Industrial Informatics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10574343/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Industrial Informatics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10574343/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

基于深度学习（DL）的威胁分类已被研究用于有效分析威胁事件，以最大限度地减少安全操作中心（SOC）的人力资源。然而，要对未知威胁事件或新威胁趋势进行准确分类和响应，SOC 安全分析人员仍需进行人工标记（HL）。这种标记过程耗费大量时间和精力，对构建高效的 SOC 响应系统造成了限制，尤其是在即时响应新产生的大规模威胁时。为此，我们提出了一种新的威胁分类模型自我训练方法 PLC-TPA。我们提出了一种基于可信度伪标记（PLC）的自我训练管道，用于自动标记新捕获的威胁。为了解决自我训练过程中的类不平衡问题，我们提出了一种考虑威胁--负载特征的新型威胁--负载中心增强（TPA）方法。通过大量实验，我们发现 PLC-TPA 实现了较高的威胁分类准确率，F1 分数约为 0.973 到 0.988，比其他自我训练方法提高了 10.9% 到 13.4%。值得注意的是，PLC-TPA 的性能甚至可以与 HL 相媲美，而且响应时间明显更快。这些研究结果表明，使用所提出的 PLC-TPA 可以大大改善基于 DL 的 SOC 环境。在对比实验中，PLC-TPA 的性能也比现有方法高出 8.3% 到 17.4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Self-Training of Cyber-Threat Classification Model With Threat-Payload Centric Augmentation

Deep learning (DL)-based threat classification has been investigated for effective analysis of threat events to minimize the human's resources in security operation centers (SOC). However, human-labeling (HL) by SOC security analysts is still necessary for accurate classification and responses to the unknown threat events or new threat trends. This labeling process consumes significant time and effort, posing limitations in constructing an efficient SOC response system, especially for immediate responses to newly generated large-scale threats. To address this, we propose a new self-training method of threat classification model, PLC-TPA. We present a self-training pipeline based on pseudo-labeling with confidence (PLC) for automatic labeling of newly captured threats. To resolve the class imbalance during self-training, we present a novel threat-payload centric augmentation (TPA) method considering threat-payload characteristics. Through extensive experiments, we show that PLC-TPA achieves a high accuracy of threat classification about 0.973 to 0.988 of F1-score, which improves other self-training methods by 10.9% to 13.4%. Notably, PLC-TPA performs comparable even to HL with significantly faster response times. These findings suggest substantial improvements in DL-based SOC environments with the proposed PLC-TPA. PLC-TPA also outperforms the existing methods by 8.3% to 17.4% in comparative experiments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Industrial Informatics 工程技术-工程：工业

CiteScore

24.10

自引率

8.90%

发文量

1202

审稿时长

5.1 months

期刊介绍： The IEEE Transactions on Industrial Informatics is a multidisciplinary journal dedicated to publishing technical papers that connect theory with practical applications of informatics in industrial settings. It focuses on the utilization of information in intelligent, distributed, and agile industrial automation and control systems. The scope includes topics such as knowledge-based and AI-enhanced automation, intelligent computer control systems, flexible and collaborative manufacturing, industrial informatics in software-defined vehicles and robotics, computer vision, industrial cyber-physical and industrial IoT systems, real-time and networked embedded systems, security in industrial processes, industrial communications, systems interoperability, and human-machine interaction.