威胁情报知识图谱的实体和关系提取

IF 4.8 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computers & Security Pub Date : 2024-09-18 DOI:10.1016/j.cose.2024.104120

Inoussa Mouiche, Sherif Saad

{"title":"威胁情报知识图谱的实体和关系提取","authors":"Inoussa Mouiche, Sherif Saad","doi":"10.1016/j.cose.2024.104120","DOIUrl":null,"url":null,"abstract":"<div><div>Advanced persistent threats (APTs) represent a complex challenge in cybersecurity as they infiltrate networks stealthily to conduct espionage, steal data, and maintain a long-term presence. To combat these threats, security professionals increasingly rely on cyber knowledge graphs (CKGs), which provide scalable solutions to analyze and structure vast amounts of cyber threat intelligence (CTI) from diverse sources in real-time, enabling the automation of proactive security measures. Developing CKGs requires extracting entity and their relationships from unstructured CTI reports. However, existing approaches face significant limitations, such as difficulties with the nuances of cybersecurity language, diverse threat terminologies, and high rates of error propagation, resulting in low accuracy and poor generalizability. This paper introduces a novel Threat Intelligence Knowledge Graph (TiKG) pipeline designed to address these challenges. The TiKG framework leverages SecureBERT, a domain-specific transformer-based model optimized for cybersecurity, and integrates it with an attention-based BiLSTM to capture the context and nuances of security texts, reducing error propagation and improving extraction accuracy. Additionally, the pipeline incorporates a domain-specific ontology and inference model to ensure precise relation mapping in relation extraction. Using three large-scale TI open-source datasets (DNRTI, STUCCO, and CYNER) and a curated CTI dataset, extensive evaluations demonstrate the effectiveness of our framework, showing significant improvements over existing methods in detecting and linking cyber threats. These contributions provide a robust platform for security professionals to analyze and predict potential attacks, develop effective defenses, and enhance the strategic capabilities of cybersecurity operations.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104120"},"PeriodicalIF":4.8000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167404824004255/pdfft?md5=9bc3e5147e5e14a8affa86bf2310d0f8&pid=1-s2.0-S0167404824004255-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Entity and relation extractions for threat intelligence knowledge graphs\",\"authors\":\"Inoussa Mouiche, Sherif Saad\",\"doi\":\"10.1016/j.cose.2024.104120\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Advanced persistent threats (APTs) represent a complex challenge in cybersecurity as they infiltrate networks stealthily to conduct espionage, steal data, and maintain a long-term presence. To combat these threats, security professionals increasingly rely on cyber knowledge graphs (CKGs), which provide scalable solutions to analyze and structure vast amounts of cyber threat intelligence (CTI) from diverse sources in real-time, enabling the automation of proactive security measures. Developing CKGs requires extracting entity and their relationships from unstructured CTI reports. However, existing approaches face significant limitations, such as difficulties with the nuances of cybersecurity language, diverse threat terminologies, and high rates of error propagation, resulting in low accuracy and poor generalizability. This paper introduces a novel Threat Intelligence Knowledge Graph (TiKG) pipeline designed to address these challenges. The TiKG framework leverages SecureBERT, a domain-specific transformer-based model optimized for cybersecurity, and integrates it with an attention-based BiLSTM to capture the context and nuances of security texts, reducing error propagation and improving extraction accuracy. Additionally, the pipeline incorporates a domain-specific ontology and inference model to ensure precise relation mapping in relation extraction. Using three large-scale TI open-source datasets (DNRTI, STUCCO, and CYNER) and a curated CTI dataset, extensive evaluations demonstrate the effectiveness of our framework, showing significant improvements over existing methods in detecting and linking cyber threats. These contributions provide a robust platform for security professionals to analyze and predict potential attacks, develop effective defenses, and enhance the strategic capabilities of cybersecurity operations.</div></div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":\"148 \",\"pages\":\"Article 104120\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0167404824004255/pdfft?md5=9bc3e5147e5e14a8affa86bf2310d0f8&pid=1-s2.0-S0167404824004255-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404824004255\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824004255","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

高级持续性威胁（APT）是网络安全领域的一项复杂挑战，因为它们会隐蔽地渗透到网络中进行间谍活动、窃取数据并保持长期存在。为应对这些威胁，安全专业人员越来越依赖网络知识图谱（CKG），它提供了可扩展的解决方案，可实时分析和构建来自不同来源的大量网络威胁情报（CTI），从而实现主动安全措施的自动化。开发 CKG 需要从非结构化 CTI 报告中提取实体及其关系。然而，现有的方法面临着很大的局限性，例如难以应对网络安全语言的细微差别、威胁术语的多样性以及高错误传播率，从而导致准确性低、通用性差。本文介绍了一种新颖的威胁情报知识图谱（TiKG）管道，旨在应对这些挑战。TiKG 框架利用专为网络安全优化的基于特定领域转换器的 SecureBERT 模型，并将其与基于注意力的 BiLSTM 相集成，以捕捉安全文本的上下文和细微差别，从而减少错误传播并提高提取准确性。此外，该管道还结合了特定领域的本体和推理模型，以确保关系提取中的精确关系映射。通过使用三个大规模 TI 开源数据集（DNRTI、STUCCO 和 CYNER）和一个经过策划的 CTI 数据集，广泛的评估证明了我们框架的有效性，在检测和链接网络威胁方面比现有方法有了显著的改进。这些贡献为安全专业人员分析和预测潜在攻击、开发有效防御以及增强网络安全行动的战略能力提供了一个强大的平台。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Entity and relation extractions for threat intelligence knowledge graphs

Advanced persistent threats (APTs) represent a complex challenge in cybersecurity as they infiltrate networks stealthily to conduct espionage, steal data, and maintain a long-term presence. To combat these threats, security professionals increasingly rely on cyber knowledge graphs (CKGs), which provide scalable solutions to analyze and structure vast amounts of cyber threat intelligence (CTI) from diverse sources in real-time, enabling the automation of proactive security measures. Developing CKGs requires extracting entity and their relationships from unstructured CTI reports. However, existing approaches face significant limitations, such as difficulties with the nuances of cybersecurity language, diverse threat terminologies, and high rates of error propagation, resulting in low accuracy and poor generalizability. This paper introduces a novel Threat Intelligence Knowledge Graph (TiKG) pipeline designed to address these challenges. The TiKG framework leverages SecureBERT, a domain-specific transformer-based model optimized for cybersecurity, and integrates it with an attention-based BiLSTM to capture the context and nuances of security texts, reducing error propagation and improving extraction accuracy. Additionally, the pipeline incorporates a domain-specific ontology and inference model to ensure precise relation mapping in relation extraction. Using three large-scale TI open-source datasets (DNRTI, STUCCO, and CYNER) and a curated CTI dataset, extensive evaluations demonstrate the effectiveness of our framework, showing significant improvements over existing methods in detecting and linking cyber threats. These contributions provide a robust platform for security professionals to analyze and predict potential attacks, develop effective defenses, and enhance the strategic capabilities of cybersecurity operations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Security 工程技术-计算机：信息系统

CiteScore

12.40

自引率

7.10%

发文量

365

审稿时长

10.7 months

期刊介绍： Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.