基于知识图谱的上下文恶意软件威胁情报信息预测。

arXiv: Cryptography and Security Pub Date : 2021-02-10 DOI:10.13140/RG.2.2.12526.54083

Nidhi Rastogi, Sharmishtha Dutta, Ryan Christian, Mohammad Zaki, Alex Gittens, C. Aggarwal

{"title":"基于知识图谱的上下文恶意软件威胁情报信息预测。","authors":"Nidhi Rastogi, Sharmishtha Dutta, Ryan Christian, Mohammad Zaki, Alex Gittens, C. Aggarwal","doi":"10.13140/RG.2.2.12526.54083","DOIUrl":null,"url":null,"abstract":"Large amounts of threat intelligence information about mal-ware attacks are available in disparate, typically unstructured, formats. Knowledge graphs can capture this information and its context using RDF triples represented by entities and relations. Sparse or inaccurate threat information, however, leads to challenges such as incomplete or erroneous triples. Named entity recognition (NER) and relation extraction (RE) models used to populate the knowledge graph cannot fully guaran-tee accurate information retrieval, further exacerbating this problem. This paper proposes an end-to-end approach to generate a Malware Knowledge Graph called MalKG, the first open-source automated knowledge graph for malware threat intelligence. MalKG dataset called MT40K1 contains approximately 40,000 triples generated from 27,354 unique entities and 34 relations. We demonstrate the application of MalKGin predicting missing malware threat intelligence information in the knowledge graph. For ground truth, we manually curate a knowledge graph called MT3K, with 3,027 triples generated from 5,741 unique entities and 22 relations. For entity prediction via a state-of-the-art entity prediction model(TuckER), our approach achieves 80.4 for the hits@10 metric (predicts the top 10 options for missing entities in the knowledge graph), and 0.75 for the MRR (mean reciprocal rank). We also propose a framework to automate the extraction of thousands of entities and relations into RDF triples, both manually and automatically, at the sentence level from1,100 malware threat intelligence reports and from the com-mon vulnerabilities and exposures (CVE) database.","PeriodicalId":420133,"journal":{"name":"arXiv: Cryptography and Security","volume":"558 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Information Prediction using Knowledge Graphs for Contextual Malware Threat Intelligence.\",\"authors\":\"Nidhi Rastogi, Sharmishtha Dutta, Ryan Christian, Mohammad Zaki, Alex Gittens, C. Aggarwal\",\"doi\":\"10.13140/RG.2.2.12526.54083\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large amounts of threat intelligence information about mal-ware attacks are available in disparate, typically unstructured, formats. Knowledge graphs can capture this information and its context using RDF triples represented by entities and relations. Sparse or inaccurate threat information, however, leads to challenges such as incomplete or erroneous triples. Named entity recognition (NER) and relation extraction (RE) models used to populate the knowledge graph cannot fully guaran-tee accurate information retrieval, further exacerbating this problem. This paper proposes an end-to-end approach to generate a Malware Knowledge Graph called MalKG, the first open-source automated knowledge graph for malware threat intelligence. MalKG dataset called MT40K1 contains approximately 40,000 triples generated from 27,354 unique entities and 34 relations. We demonstrate the application of MalKGin predicting missing malware threat intelligence information in the knowledge graph. For ground truth, we manually curate a knowledge graph called MT3K, with 3,027 triples generated from 5,741 unique entities and 22 relations. For entity prediction via a state-of-the-art entity prediction model(TuckER), our approach achieves 80.4 for the hits@10 metric (predicts the top 10 options for missing entities in the knowledge graph), and 0.75 for the MRR (mean reciprocal rank). We also propose a framework to automate the extraction of thousands of entities and relations into RDF triples, both manually and automatically, at the sentence level from1,100 malware threat intelligence reports and from the com-mon vulnerabilities and exposures (CVE) database.\",\"PeriodicalId\":420133,\"journal\":{\"name\":\"arXiv: Cryptography and Security\",\"volume\":\"558 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv: Cryptography and Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.13140/RG.2.2.12526.54083\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv: Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.13140/RG.2.2.12526.54083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

大量关于恶意软件攻击的威胁情报信息以不同的、通常是非结构化的格式提供。知识图可以使用由实体和关系表示的RDF三元组捕获这些信息及其上下文。然而，稀疏或不准确的威胁信息会导致不完整或错误的三元组等挑战。用于填充知识图的命名实体识别(NER)和关系提取(RE)模型不能完全保证信息的准确检索，进一步加剧了这一问题。本文提出了一种端到端的方法来生成恶意软件知识图谱MalKG，这是第一个开源的恶意软件威胁情报自动化知识图谱。MalKG数据集名为MT40K1，包含大约40,000个三元组，由27,354个唯一实体和34个关系生成。我们演示了MalKGin在知识图中预测缺失恶意软件威胁情报信息的应用。对于基本事实，我们手动策划了一个名为MT3K的知识图，其中从5,741个唯一实体和22个关系中生成了3,027个三元组。对于通过最先进的实体预测模型(TuckER)进行的实体预测，我们的方法在hits@10指标(预测知识图中缺失实体的前10个选项)上达到80.4,MRR(平均倒数排名)达到0.75。我们还提出了一个框架，可以自动和手动地从1,100个恶意软件威胁情报报告和常见漏洞和暴露(CVE)数据库中自动提取数千个实体和关系到RDF三元组中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Information Prediction using Knowledge Graphs for Contextual Malware Threat Intelligence.

Large amounts of threat intelligence information about mal-ware attacks are available in disparate, typically unstructured, formats. Knowledge graphs can capture this information and its context using RDF triples represented by entities and relations. Sparse or inaccurate threat information, however, leads to challenges such as incomplete or erroneous triples. Named entity recognition (NER) and relation extraction (RE) models used to populate the knowledge graph cannot fully guaran-tee accurate information retrieval, further exacerbating this problem. This paper proposes an end-to-end approach to generate a Malware Knowledge Graph called MalKG, the first open-source automated knowledge graph for malware threat intelligence. MalKG dataset called MT40K1 contains approximately 40,000 triples generated from 27,354 unique entities and 34 relations. We demonstrate the application of MalKGin predicting missing malware threat intelligence information in the knowledge graph. For ground truth, we manually curate a knowledge graph called MT3K, with 3,027 triples generated from 5,741 unique entities and 22 relations. For entity prediction via a state-of-the-art entity prediction model(TuckER), our approach achieves 80.4 for the hits@10 metric (predicts the top 10 options for missing entities in the knowledge graph), and 0.75 for the MRR (mean reciprocal rank). We also propose a framework to automate the extraction of thousands of entities and relations into RDF triples, both manually and automatically, at the sentence level from1,100 malware threat intelligence reports and from the com-mon vulnerabilities and exposures (CVE) database.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv: Cryptography and Security

自引率

0.00%

发文量