EnhanceCTI:针对行业特定网络威胁情报的增强语义过滤和特征提取框架

IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Sheng-Shan Chen, Tun-Wen Pai, Chin-Yu Sun
{"title":"EnhanceCTI:针对行业特定网络威胁情报的增强语义过滤和特征提取框架","authors":"Sheng-Shan Chen,&nbsp;Tun-Wen Pai,&nbsp;Chin-Yu Sun","doi":"10.1016/j.cose.2025.104649","DOIUrl":null,"url":null,"abstract":"<div><div>The rapid digitization of various industries has created an urgent need for robust cyber threat intelligence (CTI) systems. Organizations are increasingly developing cyber threat intelligence platforms (TIPs) to gather open-source intelligence (OSINT) and transform it into actionable defenses against information security breaches. However, the overwhelming volume and complexity of OSINT data, often including false or misleading information, pose significant challenges for effective CTI analysis. This study introduces EnhanceCTI, a novel system designed to improve the quality and industry-specific applicability of threat intelligence. EnhanceCTI employs an enhanced bidirectional encoder representations from transformers (DistilBERT)-based semantic filtering method to filter intelligence data and determine its alignment with industry-specific data extracted from TIPs. This filtering is applied across eight major industries: healthcare, finance, government, technology, education, telecommunications, critical infrastructure, and a miscellaneous “others” category. Additionally, EnhanceCTI leverages high-credibility CTI features, integrating them with SentenceBERT to create a merging judgment model. This model determines whether a given piece of intelligence should be merged with existing data or stored independently, thereby ensuring relevance and minimizing redundancy. Finally, a dedicated platform was developed, providing cybersecurity analysts with tools to rapidly assess both intelligence quality and the accuracy of industry-specific classification models. Experimental results demonstrate EnhanceCTI’s effectiveness, achieving an F1-score of 0.99 for intelligence identification and a 0.89 cosine Pearson correlation for SentenceBERT. A random forest algorithm, trained on 750 manually annotated samples, achieved an F1-score of 0.97 on the merging judgment model. These findings highlight EnhanceCTI’s ability to accurately identify threats, offering a valuable, industry-tailored solution for institutions facing the growing challenges of cybersecurity in the modern digital landscape.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"158 ","pages":"Article 104649"},"PeriodicalIF":5.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EnhanceCTI: Enhanced semantic filtering and feature extraction framework for industry-specific cyber threat intelligence\",\"authors\":\"Sheng-Shan Chen,&nbsp;Tun-Wen Pai,&nbsp;Chin-Yu Sun\",\"doi\":\"10.1016/j.cose.2025.104649\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The rapid digitization of various industries has created an urgent need for robust cyber threat intelligence (CTI) systems. Organizations are increasingly developing cyber threat intelligence platforms (TIPs) to gather open-source intelligence (OSINT) and transform it into actionable defenses against information security breaches. However, the overwhelming volume and complexity of OSINT data, often including false or misleading information, pose significant challenges for effective CTI analysis. This study introduces EnhanceCTI, a novel system designed to improve the quality and industry-specific applicability of threat intelligence. EnhanceCTI employs an enhanced bidirectional encoder representations from transformers (DistilBERT)-based semantic filtering method to filter intelligence data and determine its alignment with industry-specific data extracted from TIPs. This filtering is applied across eight major industries: healthcare, finance, government, technology, education, telecommunications, critical infrastructure, and a miscellaneous “others” category. Additionally, EnhanceCTI leverages high-credibility CTI features, integrating them with SentenceBERT to create a merging judgment model. This model determines whether a given piece of intelligence should be merged with existing data or stored independently, thereby ensuring relevance and minimizing redundancy. Finally, a dedicated platform was developed, providing cybersecurity analysts with tools to rapidly assess both intelligence quality and the accuracy of industry-specific classification models. Experimental results demonstrate EnhanceCTI’s effectiveness, achieving an F1-score of 0.99 for intelligence identification and a 0.89 cosine Pearson correlation for SentenceBERT. A random forest algorithm, trained on 750 manually annotated samples, achieved an F1-score of 0.97 on the merging judgment model. These findings highlight EnhanceCTI’s ability to accurately identify threats, offering a valuable, industry-tailored solution for institutions facing the growing challenges of cybersecurity in the modern digital landscape.</div></div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":\"158 \",\"pages\":\"Article 104649\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404825003384\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404825003384","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

各行各业的快速数字化产生了对强大的网络威胁情报(CTI)系统的迫切需求。组织越来越多地开发网络威胁情报平台(TIPs)来收集开源情报(OSINT),并将其转化为针对信息安全漏洞的可操作防御。然而,OSINT数据的庞大数量和复杂性,通常包括虚假或误导性信息,对有效的CTI分析构成了重大挑战。本研究介绍了一种新的系统EnhanceCTI,旨在提高威胁情报的质量和特定行业的适用性。EnhanceCTI采用基于转换器(蒸馏器)的增强双向编码器表示的语义过滤方法来过滤智能数据,并确定其与从TIPs中提取的行业特定数据的一致性。此过滤应用于八个主要行业:医疗保健、金融、政府、技术、教育、电信、关键基础设施和其他“其他”类别。此外,EnhanceCTI利用高可信度的CTI特征,将它们与SentenceBERT集成以创建合并判断模型。该模型决定给定的智能是与现有数据合并还是独立存储,从而确保相关性并最大限度地减少冗余。最后,开发了一个专用平台,为网络安全分析师提供快速评估情报质量和行业特定分类模型准确性的工具。实验结果证明了EnhanceCTI的有效性,智能识别的f1得分为0.99,senencebert的余弦Pearson相关系数为0.89。随机森林算法在750个人工标注的样本上进行训练,在合并判断模型上获得了f1得分0.97。这些发现凸显了EnhanceCTI准确识别威胁的能力,为在现代数字环境中面临日益严峻的网络安全挑战的机构提供了有价值的行业定制解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
EnhanceCTI: Enhanced semantic filtering and feature extraction framework for industry-specific cyber threat intelligence
The rapid digitization of various industries has created an urgent need for robust cyber threat intelligence (CTI) systems. Organizations are increasingly developing cyber threat intelligence platforms (TIPs) to gather open-source intelligence (OSINT) and transform it into actionable defenses against information security breaches. However, the overwhelming volume and complexity of OSINT data, often including false or misleading information, pose significant challenges for effective CTI analysis. This study introduces EnhanceCTI, a novel system designed to improve the quality and industry-specific applicability of threat intelligence. EnhanceCTI employs an enhanced bidirectional encoder representations from transformers (DistilBERT)-based semantic filtering method to filter intelligence data and determine its alignment with industry-specific data extracted from TIPs. This filtering is applied across eight major industries: healthcare, finance, government, technology, education, telecommunications, critical infrastructure, and a miscellaneous “others” category. Additionally, EnhanceCTI leverages high-credibility CTI features, integrating them with SentenceBERT to create a merging judgment model. This model determines whether a given piece of intelligence should be merged with existing data or stored independently, thereby ensuring relevance and minimizing redundancy. Finally, a dedicated platform was developed, providing cybersecurity analysts with tools to rapidly assess both intelligence quality and the accuracy of industry-specific classification models. Experimental results demonstrate EnhanceCTI’s effectiveness, achieving an F1-score of 0.99 for intelligence identification and a 0.89 cosine Pearson correlation for SentenceBERT. A random forest algorithm, trained on 750 manually annotated samples, achieved an F1-score of 0.97 on the merging judgment model. These findings highlight EnhanceCTI’s ability to accurately identify threats, offering a valuable, industry-tailored solution for institutions facing the growing challenges of cybersecurity in the modern digital landscape.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers & Security
Computers & Security 工程技术-计算机:信息系统
CiteScore
12.40
自引率
7.10%
发文量
365
审稿时长
10.7 months
期刊介绍: Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信