{"title":"以威胁--负载为中心增强网络威胁分类模型的自我训练","authors":"Jae-Yeol Kim;Hyuk-Yoon Kwon","doi":"10.1109/TII.2024.3413300","DOIUrl":null,"url":null,"abstract":"Deep learning (DL)-based threat classification has been investigated for effective analysis of threat events to minimize the human's resources in security operation centers (SOC). However, human-labeling (HL) by SOC security analysts is still necessary for accurate classification and responses to the unknown threat events or new threat trends. This labeling process consumes significant time and effort, posing limitations in constructing an efficient SOC response system, especially for immediate responses to newly generated large-scale threats. To address this, we propose a new self-training method of threat classification model, PLC-TPA. We present a self-training pipeline based on pseudo-labeling with confidence (PLC) for automatic labeling of newly captured threats. To resolve the class imbalance during self-training, we present a novel threat-payload centric augmentation (TPA) method considering threat-payload characteristics. Through extensive experiments, we show that PLC-TPA achieves a high accuracy of threat classification about 0.973 to 0.988 of F1-score, which improves other self-training methods by 10.9% to 13.4%. Notably, PLC-TPA performs comparable even to HL with significantly faster response times. These findings suggest substantial improvements in DL-based SOC environments with the proposed PLC-TPA. PLC-TPA also outperforms the existing methods by 8.3% to 17.4% in comparative experiments.","PeriodicalId":13301,"journal":{"name":"IEEE Transactions on Industrial Informatics","volume":"20 10","pages":"11740-11750"},"PeriodicalIF":9.9000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Self-Training of Cyber-Threat Classification Model With Threat-Payload Centric Augmentation\",\"authors\":\"Jae-Yeol Kim;Hyuk-Yoon Kwon\",\"doi\":\"10.1109/TII.2024.3413300\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning (DL)-based threat classification has been investigated for effective analysis of threat events to minimize the human's resources in security operation centers (SOC). However, human-labeling (HL) by SOC security analysts is still necessary for accurate classification and responses to the unknown threat events or new threat trends. This labeling process consumes significant time and effort, posing limitations in constructing an efficient SOC response system, especially for immediate responses to newly generated large-scale threats. To address this, we propose a new self-training method of threat classification model, PLC-TPA. We present a self-training pipeline based on pseudo-labeling with confidence (PLC) for automatic labeling of newly captured threats. To resolve the class imbalance during self-training, we present a novel threat-payload centric augmentation (TPA) method considering threat-payload characteristics. Through extensive experiments, we show that PLC-TPA achieves a high accuracy of threat classification about 0.973 to 0.988 of F1-score, which improves other self-training methods by 10.9% to 13.4%. Notably, PLC-TPA performs comparable even to HL with significantly faster response times. These findings suggest substantial improvements in DL-based SOC environments with the proposed PLC-TPA. PLC-TPA also outperforms the existing methods by 8.3% to 17.4% in comparative experiments.\",\"PeriodicalId\":13301,\"journal\":{\"name\":\"IEEE Transactions on Industrial Informatics\",\"volume\":\"20 10\",\"pages\":\"11740-11750\"},\"PeriodicalIF\":9.9000,\"publicationDate\":\"2024-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Industrial Informatics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10574343/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Industrial Informatics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10574343/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Self-Training of Cyber-Threat Classification Model With Threat-Payload Centric Augmentation
Deep learning (DL)-based threat classification has been investigated for effective analysis of threat events to minimize the human's resources in security operation centers (SOC). However, human-labeling (HL) by SOC security analysts is still necessary for accurate classification and responses to the unknown threat events or new threat trends. This labeling process consumes significant time and effort, posing limitations in constructing an efficient SOC response system, especially for immediate responses to newly generated large-scale threats. To address this, we propose a new self-training method of threat classification model, PLC-TPA. We present a self-training pipeline based on pseudo-labeling with confidence (PLC) for automatic labeling of newly captured threats. To resolve the class imbalance during self-training, we present a novel threat-payload centric augmentation (TPA) method considering threat-payload characteristics. Through extensive experiments, we show that PLC-TPA achieves a high accuracy of threat classification about 0.973 to 0.988 of F1-score, which improves other self-training methods by 10.9% to 13.4%. Notably, PLC-TPA performs comparable even to HL with significantly faster response times. These findings suggest substantial improvements in DL-based SOC environments with the proposed PLC-TPA. PLC-TPA also outperforms the existing methods by 8.3% to 17.4% in comparative experiments.
期刊介绍:
The IEEE Transactions on Industrial Informatics is a multidisciplinary journal dedicated to publishing technical papers that connect theory with practical applications of informatics in industrial settings. It focuses on the utilization of information in intelligent, distributed, and agile industrial automation and control systems. The scope includes topics such as knowledge-based and AI-enhanced automation, intelligent computer control systems, flexible and collaborative manufacturing, industrial informatics in software-defined vehicles and robotics, computer vision, industrial cyber-physical and industrial IoT systems, real-time and networked embedded systems, security in industrial processes, industrial communications, systems interoperability, and human-machine interaction.