{"title":"Infer the missing facts of D3FEND using knowledge graph representation learning","authors":"A. Khobragade, S. Ghumbre, V. Pachghare","doi":"10.1108/ijwis-03-2023-0042","DOIUrl":null,"url":null,"abstract":"\nPurpose\nMITRE and the National Security Agency cooperatively developed and maintained a D3FEND knowledge graph (KG). It provides concepts as an entity from the cybersecurity countermeasure domain, such as dynamic, emulated and file analysis. Those entities are linked by applying relationships such as analyze, may_contains and encrypt. A fundamental challenge for collaborative designers is to encode knowledge and efficiently interrelate the cyber-domain facts generated daily. However, the designers manually update the graph contents with new or missing facts to enrich the knowledge. This paper aims to propose an automated approach to predict the missing facts using the link prediction task, leveraging embedding as representation learning.\n\n\nDesign/methodology/approach\nD3FEND is available in the resource description framework (RDF) format. In the preprocessing step, the facts in RDF format converted to subject–predicate–object triplet format contain 5,967 entities and 98 relationship types. Progressive distance-based, bilinear and convolutional embedding models are applied to learn the embeddings of entities and relations. This study presents a link prediction task to infer missing facts using learned embeddings.\n\n\nFindings\nExperimental results show that the translational model performs well on high-rank results, whereas the bilinear model is superior in capturing the latent semantics of complex relationship types. However, the convolutional model outperforms 44% of the true facts and achieves a 3% improvement in results compared to other models.\n\n\nResearch limitations/implications\nDespite the success of embedding models to enrich D3FEND using link prediction under the supervised learning setup, it has some limitations, such as not capturing diversity and hierarchies of relations. The average node degree of D3FEND KG is 16.85, with 12% of entities having a node degree less than 2, especially there are many entities or relations with few or no observed links. This results in sparsity and data imbalance, which affect the model performance even after increasing the embedding vector size. Moreover, KG embedding models consider existing entities and relations and may not incorporate external or contextual information such as textual descriptions, temporal dynamics or domain knowledge, which can enhance the link prediction performance.\n\n\nPractical implications\nLink prediction in the D3FEND KG can benefit cybersecurity countermeasure strategies in several ways, such as it can help to identify gaps or weaknesses in the existing defensive methods and suggest possible ways to improve or augment them; it can help to compare and contrast different defensive methods and understand their trade-offs and synergies; it can help to discover novel or emerging defensive methods by inferring new relations from existing data or external sources; and it can help to generate recommendations or guidance for selecting or deploying appropriate defensive methods based on the characteristics and objectives of the system or network.\n\n\nOriginality/value\nThe representation learning approach helps to reduce incompleteness using a link prediction that infers possible missing facts by using the existing entities and relations of D3FEND.\n","PeriodicalId":44153,"journal":{"name":"International Journal of Web Information Systems","volume":null,"pages":null},"PeriodicalIF":2.5000,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Web Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/ijwis-03-2023-0042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
MITRE and the National Security Agency cooperatively developed and maintained a D3FEND knowledge graph (KG). It provides concepts as an entity from the cybersecurity countermeasure domain, such as dynamic, emulated and file analysis. Those entities are linked by applying relationships such as analyze, may_contains and encrypt. A fundamental challenge for collaborative designers is to encode knowledge and efficiently interrelate the cyber-domain facts generated daily. However, the designers manually update the graph contents with new or missing facts to enrich the knowledge. This paper aims to propose an automated approach to predict the missing facts using the link prediction task, leveraging embedding as representation learning.
Design/methodology/approach
D3FEND is available in the resource description framework (RDF) format. In the preprocessing step, the facts in RDF format converted to subject–predicate–object triplet format contain 5,967 entities and 98 relationship types. Progressive distance-based, bilinear and convolutional embedding models are applied to learn the embeddings of entities and relations. This study presents a link prediction task to infer missing facts using learned embeddings.
Findings
Experimental results show that the translational model performs well on high-rank results, whereas the bilinear model is superior in capturing the latent semantics of complex relationship types. However, the convolutional model outperforms 44% of the true facts and achieves a 3% improvement in results compared to other models.
Research limitations/implications
Despite the success of embedding models to enrich D3FEND using link prediction under the supervised learning setup, it has some limitations, such as not capturing diversity and hierarchies of relations. The average node degree of D3FEND KG is 16.85, with 12% of entities having a node degree less than 2, especially there are many entities or relations with few or no observed links. This results in sparsity and data imbalance, which affect the model performance even after increasing the embedding vector size. Moreover, KG embedding models consider existing entities and relations and may not incorporate external or contextual information such as textual descriptions, temporal dynamics or domain knowledge, which can enhance the link prediction performance.
Practical implications
Link prediction in the D3FEND KG can benefit cybersecurity countermeasure strategies in several ways, such as it can help to identify gaps or weaknesses in the existing defensive methods and suggest possible ways to improve or augment them; it can help to compare and contrast different defensive methods and understand their trade-offs and synergies; it can help to discover novel or emerging defensive methods by inferring new relations from existing data or external sources; and it can help to generate recommendations or guidance for selecting or deploying appropriate defensive methods based on the characteristics and objectives of the system or network.
Originality/value
The representation learning approach helps to reduce incompleteness using a link prediction that infers possible missing facts by using the existing entities and relations of D3FEND.
期刊介绍:
The Global Information Infrastructure is a daily reality. In spite of the many applications in all domains of our societies: e-business, e-commerce, e-learning, e-science, and e-government, for instance, and in spite of the tremendous advances by engineers and scientists, the seamless development of Web information systems and services remains a major challenge. The journal examines how current shared vision for the future is one of semantically-rich information and service oriented architecture for global information systems. This vision is at the convergence of progress in technologies such as XML, Web services, RDF, OWL, of multimedia, multimodal, and multilingual information retrieval, and of distributed, mobile and ubiquitous computing. Topicality While the International Journal of Web Information Systems covers a broad range of topics, the journal welcomes papers that provide a perspective on all aspects of Web information systems: Web semantics and Web dynamics, Web mining and searching, Web databases and Web data integration, Web-based commerce and e-business, Web collaboration and distributed computing, Internet computing and networks, performance of Web applications, and Web multimedia services and Web-based education.