威胁情报知识图谱的实体和关系提取

IF 4.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Inoussa Mouiche, Sherif Saad
{"title":"威胁情报知识图谱的实体和关系提取","authors":"Inoussa Mouiche,&nbsp;Sherif Saad","doi":"10.1016/j.cose.2024.104120","DOIUrl":null,"url":null,"abstract":"<div><div>Advanced persistent threats (APTs) represent a complex challenge in cybersecurity as they infiltrate networks stealthily to conduct espionage, steal data, and maintain a long-term presence. To combat these threats, security professionals increasingly rely on cyber knowledge graphs (CKGs), which provide scalable solutions to analyze and structure vast amounts of cyber threat intelligence (CTI) from diverse sources in real-time, enabling the automation of proactive security measures. Developing CKGs requires extracting entity and their relationships from unstructured CTI reports. However, existing approaches face significant limitations, such as difficulties with the nuances of cybersecurity language, diverse threat terminologies, and high rates of error propagation, resulting in low accuracy and poor generalizability. This paper introduces a novel Threat Intelligence Knowledge Graph (TiKG) pipeline designed to address these challenges. The TiKG framework leverages SecureBERT, a domain-specific transformer-based model optimized for cybersecurity, and integrates it with an attention-based BiLSTM to capture the context and nuances of security texts, reducing error propagation and improving extraction accuracy. Additionally, the pipeline incorporates a domain-specific ontology and inference model to ensure precise relation mapping in relation extraction. Using three large-scale TI open-source datasets (DNRTI, STUCCO, and CYNER) and a curated CTI dataset, extensive evaluations demonstrate the effectiveness of our framework, showing significant improvements over existing methods in detecting and linking cyber threats. These contributions provide a robust platform for security professionals to analyze and predict potential attacks, develop effective defenses, and enhance the strategic capabilities of cybersecurity operations.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104120"},"PeriodicalIF":4.8000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167404824004255/pdfft?md5=9bc3e5147e5e14a8affa86bf2310d0f8&pid=1-s2.0-S0167404824004255-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Entity and relation extractions for threat intelligence knowledge graphs\",\"authors\":\"Inoussa Mouiche,&nbsp;Sherif Saad\",\"doi\":\"10.1016/j.cose.2024.104120\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Advanced persistent threats (APTs) represent a complex challenge in cybersecurity as they infiltrate networks stealthily to conduct espionage, steal data, and maintain a long-term presence. To combat these threats, security professionals increasingly rely on cyber knowledge graphs (CKGs), which provide scalable solutions to analyze and structure vast amounts of cyber threat intelligence (CTI) from diverse sources in real-time, enabling the automation of proactive security measures. Developing CKGs requires extracting entity and their relationships from unstructured CTI reports. However, existing approaches face significant limitations, such as difficulties with the nuances of cybersecurity language, diverse threat terminologies, and high rates of error propagation, resulting in low accuracy and poor generalizability. This paper introduces a novel Threat Intelligence Knowledge Graph (TiKG) pipeline designed to address these challenges. The TiKG framework leverages SecureBERT, a domain-specific transformer-based model optimized for cybersecurity, and integrates it with an attention-based BiLSTM to capture the context and nuances of security texts, reducing error propagation and improving extraction accuracy. Additionally, the pipeline incorporates a domain-specific ontology and inference model to ensure precise relation mapping in relation extraction. Using three large-scale TI open-source datasets (DNRTI, STUCCO, and CYNER) and a curated CTI dataset, extensive evaluations demonstrate the effectiveness of our framework, showing significant improvements over existing methods in detecting and linking cyber threats. These contributions provide a robust platform for security professionals to analyze and predict potential attacks, develop effective defenses, and enhance the strategic capabilities of cybersecurity operations.</div></div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":\"148 \",\"pages\":\"Article 104120\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0167404824004255/pdfft?md5=9bc3e5147e5e14a8affa86bf2310d0f8&pid=1-s2.0-S0167404824004255-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404824004255\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824004255","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

高级持续性威胁(APT)是网络安全领域的一项复杂挑战,因为它们会隐蔽地渗透到网络中进行间谍活动、窃取数据并保持长期存在。为应对这些威胁,安全专业人员越来越依赖网络知识图谱(CKG),它提供了可扩展的解决方案,可实时分析和构建来自不同来源的大量网络威胁情报(CTI),从而实现主动安全措施的自动化。开发 CKG 需要从非结构化 CTI 报告中提取实体及其关系。然而,现有的方法面临着很大的局限性,例如难以应对网络安全语言的细微差别、威胁术语的多样性以及高错误传播率,从而导致准确性低、通用性差。本文介绍了一种新颖的威胁情报知识图谱(TiKG)管道,旨在应对这些挑战。TiKG 框架利用专为网络安全优化的基于特定领域转换器的 SecureBERT 模型,并将其与基于注意力的 BiLSTM 相集成,以捕捉安全文本的上下文和细微差别,从而减少错误传播并提高提取准确性。此外,该管道还结合了特定领域的本体和推理模型,以确保关系提取中的精确关系映射。通过使用三个大规模 TI 开源数据集(DNRTI、STUCCO 和 CYNER)和一个经过策划的 CTI 数据集,广泛的评估证明了我们框架的有效性,在检测和链接网络威胁方面比现有方法有了显著的改进。这些贡献为安全专业人员分析和预测潜在攻击、开发有效防御以及增强网络安全行动的战略能力提供了一个强大的平台。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Entity and relation extractions for threat intelligence knowledge graphs
Advanced persistent threats (APTs) represent a complex challenge in cybersecurity as they infiltrate networks stealthily to conduct espionage, steal data, and maintain a long-term presence. To combat these threats, security professionals increasingly rely on cyber knowledge graphs (CKGs), which provide scalable solutions to analyze and structure vast amounts of cyber threat intelligence (CTI) from diverse sources in real-time, enabling the automation of proactive security measures. Developing CKGs requires extracting entity and their relationships from unstructured CTI reports. However, existing approaches face significant limitations, such as difficulties with the nuances of cybersecurity language, diverse threat terminologies, and high rates of error propagation, resulting in low accuracy and poor generalizability. This paper introduces a novel Threat Intelligence Knowledge Graph (TiKG) pipeline designed to address these challenges. The TiKG framework leverages SecureBERT, a domain-specific transformer-based model optimized for cybersecurity, and integrates it with an attention-based BiLSTM to capture the context and nuances of security texts, reducing error propagation and improving extraction accuracy. Additionally, the pipeline incorporates a domain-specific ontology and inference model to ensure precise relation mapping in relation extraction. Using three large-scale TI open-source datasets (DNRTI, STUCCO, and CYNER) and a curated CTI dataset, extensive evaluations demonstrate the effectiveness of our framework, showing significant improvements over existing methods in detecting and linking cyber threats. These contributions provide a robust platform for security professionals to analyze and predict potential attacks, develop effective defenses, and enhance the strategic capabilities of cybersecurity operations.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers & Security
Computers & Security 工程技术-计算机:信息系统
CiteScore
12.40
自引率
7.10%
发文量
365
审稿时长
10.7 months
期刊介绍: Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信