Real-Time Automated Cyber Threat Classification and Emerging Threat Detection Framework

IEEE Open Journal of the Computer Society Pub Date : 2025-06-16 DOI:10.1109/OJCS.2025.3580235

Alemayehu Tilahun Haile;Surafel Lemma Abebe;Henock Mulugeta Melaku

{"title":"Real-Time Automated Cyber Threat Classification and Emerging Threat Detection Framework","authors":"Alemayehu Tilahun Haile;Surafel Lemma Abebe;Henock Mulugeta Melaku","doi":"10.1109/OJCS.2025.3580235","DOIUrl":null,"url":null,"abstract":"Automating cyber threat intelligence (CTI) collection and analysis in real time is critical for the timely detection and mitigation of cyber threats. Cybersecurity researchers have recently recommended CTI as a proactive and robust method for automated cyber threat prediction. This automated solution collects and analyzes real-time data from social media, cybersecurity forums, and hacker forums where cybersecurity analysts and hackers discuss cybersecurity-related topics to discover potential threats. In this article, we propose a comprehensive framework that automates both cyber threat classification and emerging threat detection using real-time data from surface, deep, and dark web sources. We collected real-time data from hackers and security forums to construct binary and multiclass cyber threat classifications. We employed a labeled leaked dataset to be considered as ground truth for classification. Machine and deep learning techniques were used to perform the classification. Latent Dirichlet allocation (LDA) and nonnegative matrix factorization (NMF) were used to analyze topic distribution over time and identify emerging threats. This approach allows for the identification of zero-day attacks and other emerging threats by monitoring shifts in topics. Using a support vector machine with the bag-of-words (binary term weight) model achieved the highest accuracies of 93.67 and 96.35 for binary and multiclass classifications, respectively. Moreover, LDA and NMF were used to extract the top topics from various numbers of topics. The LDA model is well suited for identifying emerging trends and useful for real-time threat monitoring in cybersecurity.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"921-930"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11037544","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11037544/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Automating cyber threat intelligence (CTI) collection and analysis in real time is critical for the timely detection and mitigation of cyber threats. Cybersecurity researchers have recently recommended CTI as a proactive and robust method for automated cyber threat prediction. This automated solution collects and analyzes real-time data from social media, cybersecurity forums, and hacker forums where cybersecurity analysts and hackers discuss cybersecurity-related topics to discover potential threats. In this article, we propose a comprehensive framework that automates both cyber threat classification and emerging threat detection using real-time data from surface, deep, and dark web sources. We collected real-time data from hackers and security forums to construct binary and multiclass cyber threat classifications. We employed a labeled leaked dataset to be considered as ground truth for classification. Machine and deep learning techniques were used to perform the classification. Latent Dirichlet allocation (LDA) and nonnegative matrix factorization (NMF) were used to analyze topic distribution over time and identify emerging threats. This approach allows for the identification of zero-day attacks and other emerging threats by monitoring shifts in topics. Using a support vector machine with the bag-of-words (binary term weight) model achieved the highest accuracies of 93.67 and 96.35 for binary and multiclass classifications, respectively. Moreover, LDA and NMF were used to extract the top topics from various numbers of topics. The LDA model is well suited for identifying emerging trends and useful for real-time threat monitoring in cybersecurity.

查看原文本刊更多论文

实时自动网络威胁分类和新兴威胁检测框架

实时自动收集和分析网络威胁情报（CTI）对于及时发现和缓解网络威胁至关重要。网络安全研究人员最近推荐CTI作为自动网络威胁预测的主动和强大方法。该自动化解决方案收集和分析来自社交媒体、网络安全论坛和黑客论坛的实时数据，网络安全分析师和黑客在这些论坛上讨论网络安全相关主题，以发现潜在威胁。在本文中，我们提出了一个综合框架，该框架使用来自表面、深层和暗网的实时数据自动进行网络威胁分类和新出现的威胁检测。我们从黑客和安全论坛收集实时数据，构建二元和多级网络威胁分类。我们使用了一个被标记的泄露数据集作为分类的基础事实。使用机器和深度学习技术进行分类。使用潜在狄利克雷分配（LDA）和非负矩阵分解（NMF）分析主题随时间的分布，识别新出现的威胁。这种方法允许通过监视主题的变化来识别零日攻击和其他新出现的威胁。使用带有词袋（二值项权重）模型的支持向量机在二值分类和多类分类中分别达到了93.67和96.35的最高准确率。此外，利用LDA和NMF从不同数量的主题中提取顶级主题。LDA模型非常适合识别新兴趋势，对网络安全中的实时威胁监控非常有用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Open Journal of the Computer Society

CiteScore

12.60

自引率

0.00%

发文量