Real-Time Automated Cyber Threat Classification and Emerging Threat Detection Framework

Alemayehu Tilahun Haile;Surafel Lemma Abebe;Henock Mulugeta Melaku
{"title":"Real-Time Automated Cyber Threat Classification and Emerging Threat Detection Framework","authors":"Alemayehu Tilahun Haile;Surafel Lemma Abebe;Henock Mulugeta Melaku","doi":"10.1109/OJCS.2025.3580235","DOIUrl":null,"url":null,"abstract":"Automating cyber threat intelligence (CTI) collection and analysis in real time is critical for the timely detection and mitigation of cyber threats. Cybersecurity researchers have recently recommended CTI as a proactive and robust method for automated cyber threat prediction. This automated solution collects and analyzes real-time data from social media, cybersecurity forums, and hacker forums where cybersecurity analysts and hackers discuss cybersecurity-related topics to discover potential threats. In this article, we propose a comprehensive framework that automates both cyber threat classification and emerging threat detection using real-time data from surface, deep, and dark web sources. We collected real-time data from hackers and security forums to construct binary and multiclass cyber threat classifications. We employed a labeled leaked dataset to be considered as ground truth for classification. Machine and deep learning techniques were used to perform the classification. Latent Dirichlet allocation (LDA) and nonnegative matrix factorization (NMF) were used to analyze topic distribution over time and identify emerging threats. This approach allows for the identification of zero-day attacks and other emerging threats by monitoring shifts in topics. Using a support vector machine with the bag-of-words (binary term weight) model achieved the highest accuracies of 93.67 and 96.35 for binary and multiclass classifications, respectively. Moreover, LDA and NMF were used to extract the top topics from various numbers of topics. The LDA model is well suited for identifying emerging trends and useful for real-time threat monitoring in cybersecurity.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"921-930"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11037544","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11037544/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Automating cyber threat intelligence (CTI) collection and analysis in real time is critical for the timely detection and mitigation of cyber threats. Cybersecurity researchers have recently recommended CTI as a proactive and robust method for automated cyber threat prediction. This automated solution collects and analyzes real-time data from social media, cybersecurity forums, and hacker forums where cybersecurity analysts and hackers discuss cybersecurity-related topics to discover potential threats. In this article, we propose a comprehensive framework that automates both cyber threat classification and emerging threat detection using real-time data from surface, deep, and dark web sources. We collected real-time data from hackers and security forums to construct binary and multiclass cyber threat classifications. We employed a labeled leaked dataset to be considered as ground truth for classification. Machine and deep learning techniques were used to perform the classification. Latent Dirichlet allocation (LDA) and nonnegative matrix factorization (NMF) were used to analyze topic distribution over time and identify emerging threats. This approach allows for the identification of zero-day attacks and other emerging threats by monitoring shifts in topics. Using a support vector machine with the bag-of-words (binary term weight) model achieved the highest accuracies of 93.67 and 96.35 for binary and multiclass classifications, respectively. Moreover, LDA and NMF were used to extract the top topics from various numbers of topics. The LDA model is well suited for identifying emerging trends and useful for real-time threat monitoring in cybersecurity.
实时自动网络威胁分类和新兴威胁检测框架
实时自动收集和分析网络威胁情报(CTI)对于及时发现和缓解网络威胁至关重要。网络安全研究人员最近推荐CTI作为自动网络威胁预测的主动和强大方法。该自动化解决方案收集和分析来自社交媒体、网络安全论坛和黑客论坛的实时数据,网络安全分析师和黑客在这些论坛上讨论网络安全相关主题,以发现潜在威胁。在本文中,我们提出了一个综合框架,该框架使用来自表面、深层和暗网的实时数据自动进行网络威胁分类和新出现的威胁检测。我们从黑客和安全论坛收集实时数据,构建二元和多级网络威胁分类。我们使用了一个被标记的泄露数据集作为分类的基础事实。使用机器和深度学习技术进行分类。使用潜在狄利克雷分配(LDA)和非负矩阵分解(NMF)分析主题随时间的分布,识别新出现的威胁。这种方法允许通过监视主题的变化来识别零日攻击和其他新出现的威胁。使用带有词袋(二值项权重)模型的支持向量机在二值分类和多类分类中分别达到了93.67和96.35的最高准确率。此外,利用LDA和NMF从不同数量的主题中提取顶级主题。LDA模型非常适合识别新兴趋势,对网络安全中的实时威胁监控非常有用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
12.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信