{"title":"LLM-TIKG: Threat intelligence knowledge graph construction utilizing large language model","authors":"","doi":"10.1016/j.cose.2024.103999","DOIUrl":null,"url":null,"abstract":"<div><p>Open-source threat intelligence is often unstructured and cannot be directly applied to the next detection and defense. By constructing a knowledge graph through open-source threat intelligence, we can better apply this information to intrusion detection. However, the current methods for constructing knowledge graphs face limitations due to the domain-specific attributes of entities and the analysis of lengthy texts, and they require large amounts of labeled data. Furthermore, there is a lack of authoritative open-source annotated threat intelligence datasets, which require significant manual effort. Moreover, it is noteworthy that current research often neglects the textual descriptions of attack behaviors, resulting in the loss of vital information to understand intricate cyber threats. To address these issues, we propose LLM-TIKG that applies the large language model to construct a knowledge graph from unstructured open-source threat intelligence. The few-shot learning capability of GPT is leveraged to achieve data annotation and augmentation, thereby creating the datasets for fine-tuning a smaller language model (7B). Using the fine-tuned model, we perform topic classification on the collected reports, extract entities and relationships, and extract TTPs from the attack description. This process results in the construction of a threat intelligence knowledge graph, enabling automated and universal analysis of textualized threat intelligence. The experimental results demonstrate improved performance in both named entity recognition and TTP classification, achieving the precision of 87.88% and 96.53%, respectively.</p></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":null,"pages":null},"PeriodicalIF":4.8000,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824003043","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Open-source threat intelligence is often unstructured and cannot be directly applied to the next detection and defense. By constructing a knowledge graph through open-source threat intelligence, we can better apply this information to intrusion detection. However, the current methods for constructing knowledge graphs face limitations due to the domain-specific attributes of entities and the analysis of lengthy texts, and they require large amounts of labeled data. Furthermore, there is a lack of authoritative open-source annotated threat intelligence datasets, which require significant manual effort. Moreover, it is noteworthy that current research often neglects the textual descriptions of attack behaviors, resulting in the loss of vital information to understand intricate cyber threats. To address these issues, we propose LLM-TIKG that applies the large language model to construct a knowledge graph from unstructured open-source threat intelligence. The few-shot learning capability of GPT is leveraged to achieve data annotation and augmentation, thereby creating the datasets for fine-tuning a smaller language model (7B). Using the fine-tuned model, we perform topic classification on the collected reports, extract entities and relationships, and extract TTPs from the attack description. This process results in the construction of a threat intelligence knowledge graph, enabling automated and universal analysis of textualized threat intelligence. The experimental results demonstrate improved performance in both named entity recognition and TTP classification, achieving the precision of 87.88% and 96.53%, respectively.
期刊介绍:
Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world.
Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.