EnhanceCTI：针对行业特定网络威胁情报的增强语义过滤和特征提取框架

IF 5.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computers & Security Pub Date : 2025-09-01 DOI:10.1016/j.cose.2025.104649

Sheng-Shan Chen, Tun-Wen Pai, Chin-Yu Sun

{"title":"EnhanceCTI：针对行业特定网络威胁情报的增强语义过滤和特征提取框架","authors":"Sheng-Shan Chen, Tun-Wen Pai, Chin-Yu Sun","doi":"10.1016/j.cose.2025.104649","DOIUrl":null,"url":null,"abstract":"<div><div>The rapid digitization of various industries has created an urgent need for robust cyber threat intelligence (CTI) systems. Organizations are increasingly developing cyber threat intelligence platforms (TIPs) to gather open-source intelligence (OSINT) and transform it into actionable defenses against information security breaches. However, the overwhelming volume and complexity of OSINT data, often including false or misleading information, pose significant challenges for effective CTI analysis. This study introduces EnhanceCTI, a novel system designed to improve the quality and industry-specific applicability of threat intelligence. EnhanceCTI employs an enhanced bidirectional encoder representations from transformers (DistilBERT)-based semantic filtering method to filter intelligence data and determine its alignment with industry-specific data extracted from TIPs. This filtering is applied across eight major industries: healthcare, finance, government, technology, education, telecommunications, critical infrastructure, and a miscellaneous “others” category. Additionally, EnhanceCTI leverages high-credibility CTI features, integrating them with SentenceBERT to create a merging judgment model. This model determines whether a given piece of intelligence should be merged with existing data or stored independently, thereby ensuring relevance and minimizing redundancy. Finally, a dedicated platform was developed, providing cybersecurity analysts with tools to rapidly assess both intelligence quality and the accuracy of industry-specific classification models. Experimental results demonstrate EnhanceCTI’s effectiveness, achieving an F1-score of 0.99 for intelligence identification and a 0.89 cosine Pearson correlation for SentenceBERT. A random forest algorithm, trained on 750 manually annotated samples, achieved an F1-score of 0.97 on the merging judgment model. These findings highlight EnhanceCTI’s ability to accurately identify threats, offering a valuable, industry-tailored solution for institutions facing the growing challenges of cybersecurity in the modern digital landscape.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"158 ","pages":"Article 104649"},"PeriodicalIF":5.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EnhanceCTI: Enhanced semantic filtering and feature extraction framework for industry-specific cyber threat intelligence\",\"authors\":\"Sheng-Shan Chen, Tun-Wen Pai, Chin-Yu Sun\",\"doi\":\"10.1016/j.cose.2025.104649\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The rapid digitization of various industries has created an urgent need for robust cyber threat intelligence (CTI) systems. Organizations are increasingly developing cyber threat intelligence platforms (TIPs) to gather open-source intelligence (OSINT) and transform it into actionable defenses against information security breaches. However, the overwhelming volume and complexity of OSINT data, often including false or misleading information, pose significant challenges for effective CTI analysis. This study introduces EnhanceCTI, a novel system designed to improve the quality and industry-specific applicability of threat intelligence. EnhanceCTI employs an enhanced bidirectional encoder representations from transformers (DistilBERT)-based semantic filtering method to filter intelligence data and determine its alignment with industry-specific data extracted from TIPs. This filtering is applied across eight major industries: healthcare, finance, government, technology, education, telecommunications, critical infrastructure, and a miscellaneous “others” category. Additionally, EnhanceCTI leverages high-credibility CTI features, integrating them with SentenceBERT to create a merging judgment model. This model determines whether a given piece of intelligence should be merged with existing data or stored independently, thereby ensuring relevance and minimizing redundancy. Finally, a dedicated platform was developed, providing cybersecurity analysts with tools to rapidly assess both intelligence quality and the accuracy of industry-specific classification models. Experimental results demonstrate EnhanceCTI’s effectiveness, achieving an F1-score of 0.99 for intelligence identification and a 0.89 cosine Pearson correlation for SentenceBERT. A random forest algorithm, trained on 750 manually annotated samples, achieved an F1-score of 0.97 on the merging judgment model. These findings highlight EnhanceCTI’s ability to accurately identify threats, offering a valuable, industry-tailored solution for institutions facing the growing challenges of cybersecurity in the modern digital landscape.</div></div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":\"158 \",\"pages\":\"Article 104649\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404825003384\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404825003384","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

各行各业的快速数字化产生了对强大的网络威胁情报（CTI）系统的迫切需求。组织越来越多地开发网络威胁情报平台（TIPs）来收集开源情报（OSINT），并将其转化为针对信息安全漏洞的可操作防御。然而，OSINT数据的庞大数量和复杂性，通常包括虚假或误导性信息，对有效的CTI分析构成了重大挑战。本研究介绍了一种新的系统EnhanceCTI，旨在提高威胁情报的质量和特定行业的适用性。EnhanceCTI采用基于转换器（蒸馏器）的增强双向编码器表示的语义过滤方法来过滤智能数据，并确定其与从TIPs中提取的行业特定数据的一致性。此过滤应用于八个主要行业：医疗保健、金融、政府、技术、教育、电信、关键基础设施和其他“其他”类别。此外，EnhanceCTI利用高可信度的CTI特征，将它们与SentenceBERT集成以创建合并判断模型。该模型决定给定的智能是与现有数据合并还是独立存储，从而确保相关性并最大限度地减少冗余。最后，开发了一个专用平台，为网络安全分析师提供快速评估情报质量和行业特定分类模型准确性的工具。实验结果证明了EnhanceCTI的有效性，智能识别的f1得分为0.99，senencebert的余弦Pearson相关系数为0.89。随机森林算法在750个人工标注的样本上进行训练，在合并判断模型上获得了f1得分0.97。这些发现凸显了EnhanceCTI准确识别威胁的能力，为在现代数字环境中面临日益严峻的网络安全挑战的机构提供了有价值的行业定制解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

EnhanceCTI: Enhanced semantic filtering and feature extraction framework for industry-specific cyber threat intelligence

The rapid digitization of various industries has created an urgent need for robust cyber threat intelligence (CTI) systems. Organizations are increasingly developing cyber threat intelligence platforms (TIPs) to gather open-source intelligence (OSINT) and transform it into actionable defenses against information security breaches. However, the overwhelming volume and complexity of OSINT data, often including false or misleading information, pose significant challenges for effective CTI analysis. This study introduces EnhanceCTI, a novel system designed to improve the quality and industry-specific applicability of threat intelligence. EnhanceCTI employs an enhanced bidirectional encoder representations from transformers (DistilBERT)-based semantic filtering method to filter intelligence data and determine its alignment with industry-specific data extracted from TIPs. This filtering is applied across eight major industries: healthcare, finance, government, technology, education, telecommunications, critical infrastructure, and a miscellaneous “others” category. Additionally, EnhanceCTI leverages high-credibility CTI features, integrating them with SentenceBERT to create a merging judgment model. This model determines whether a given piece of intelligence should be merged with existing data or stored independently, thereby ensuring relevance and minimizing redundancy. Finally, a dedicated platform was developed, providing cybersecurity analysts with tools to rapidly assess both intelligence quality and the accuracy of industry-specific classification models. Experimental results demonstrate EnhanceCTI’s effectiveness, achieving an F1-score of 0.99 for intelligence identification and a 0.89 cosine Pearson correlation for SentenceBERT. A random forest algorithm, trained on 750 manually annotated samples, achieved an F1-score of 0.97 on the merging judgment model. These findings highlight EnhanceCTI’s ability to accurately identify threats, offering a valuable, industry-tailored solution for institutions facing the growing challenges of cybersecurity in the modern digital landscape.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Security 工程技术-计算机：信息系统

CiteScore

12.40

自引率

7.10%

发文量

365

审稿时长

10.7 months

期刊介绍： Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.