An Embedding Model for Knowledge Graph Completion Based on Graph Sub-Hop Convolutional Network

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Research Pub Date : 2022-11-28 DOI:10.1016/j.bdr.2022.100351

Haitao He , Haoran Niu , Jianzhou Feng , Junlan Nie , Yangsen Zhang , Jiadong Ren

{"title":"An Embedding Model for Knowledge Graph Completion Based on Graph Sub-Hop Convolutional Network","authors":"Haitao He , Haoran Niu , Jianzhou Feng , Junlan Nie , Yangsen Zhang , Jiadong Ren","doi":"10.1016/j.bdr.2022.100351","DOIUrl":null,"url":null,"abstract":"<div><p>The research on knowledge graph completion based on representation learning<span><span> is increasingly dependent on the node structural feature in the graph. However, a large number of nodes have few immediate neighbors, resulting in the node features unable to be fully expressed. Hence, multi-hop structure features are crucial to the representation learning of nodes. GCN (Graph Convolutional Network) is a graph embedding model that can introduce the multi-hop structure. However, the multi-hop information transmitted between GCN layers suffers a lot of losses. This would lead to the insufficient mining of the node structure features and semantic feature association among entities, further reducing the efficiency of graph knowledge completion. A gate-controlled graph sub-hop </span>convolutional network<span> model for knowledge graph completion is proposed to fill these research gaps. Firstly, a graph sub-hop convolutional network based on matrix representation is designed, which can transmit multi-hop neighbor features directly to the encoded node vector to avoid a large loss of features during multi-hop transmission. On this basis, the implicit multi-hop relations are explicitly embedded into the model based on the TransE. In the process of each hop convolution, aiming at the accumulation of noise redundancy caused by the increase of the receptive field, a sub-hop gate mechanism strategy is proposed to filter information. Finally, the linear model is used to decode the encoded nodes and then complete the knowledge graph. We carried out experimental comparison and analysis on WN18RR, FB15k-237, UMLS, and KINSHIP datasets. The results show that the embedding method based on the sub-hop structural information fusion can greatly improve the results of link prediction.</span></span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100351"},"PeriodicalIF":3.5000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Research","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214579622000454","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The research on knowledge graph completion based on representation learning is increasingly dependent on the node structural feature in the graph. However, a large number of nodes have few immediate neighbors, resulting in the node features unable to be fully expressed. Hence, multi-hop structure features are crucial to the representation learning of nodes. GCN (Graph Convolutional Network) is a graph embedding model that can introduce the multi-hop structure. However, the multi-hop information transmitted between GCN layers suffers a lot of losses. This would lead to the insufficient mining of the node structure features and semantic feature association among entities, further reducing the efficiency of graph knowledge completion. A gate-controlled graph sub-hop convolutional network model for knowledge graph completion is proposed to fill these research gaps. Firstly, a graph sub-hop convolutional network based on matrix representation is designed, which can transmit multi-hop neighbor features directly to the encoded node vector to avoid a large loss of features during multi-hop transmission. On this basis, the implicit multi-hop relations are explicitly embedded into the model based on the TransE. In the process of each hop convolution, aiming at the accumulation of noise redundancy caused by the increase of the receptive field, a sub-hop gate mechanism strategy is proposed to filter information. Finally, the linear model is used to decode the encoded nodes and then complete the knowledge graph. We carried out experimental comparison and analysis on WN18RR, FB15k-237, UMLS, and KINSHIP datasets. The results show that the embedding method based on the sub-hop structural information fusion can greatly improve the results of link prediction.

查看原文本刊更多论文

基于图子跳卷积网络的知识图补全嵌入模型

基于表示学习的知识图补全研究越来越依赖于图中节点的结构特征。然而，由于大量节点的近邻很少，导致节点特征无法得到充分表达。因此，多跳结构特征对节点的表示学习至关重要。GCN(图卷积网络)是一种引入多跳结构的图嵌入模型。然而，在GCN层之间传输的多跳信息存在很大的损失。这将导致实体之间的节点结构特征和语义特征关联挖掘不足，进一步降低图知识补全的效率。提出了一种用于知识图补全的门控图子跳卷积网络模型来填补这些研究空白。首先，设计了一种基于矩阵表示的图子跳卷积网络，该网络可以将多跳邻居特征直接传输到编码的节点向量上，避免了多跳传输过程中特征的大量丢失;在此基础上，隐式多跳关系被显式嵌入到基于TransE的模型中。在每跳卷积过程中，针对接收野增大导致的噪声冗余积累，提出了一种子跳门机制策略对信息进行过滤。最后，利用线性模型对编码节点进行解码，完成知识图谱。我们对WN18RR、FB15k-237、UMLS和KINSHIP数据集进行了实验比较和分析。结果表明，基于子跳结构信息融合的嵌入方法可以大大提高链路预测的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Big Data Research Computer Science-Computer Science Applications

CiteScore

8.40

自引率

3.00%

发文量

期刊介绍： The journal aims to promote and communicate advances in big data research by providing a fast and high quality forum for researchers, practitioners and policy makers from the very many different communities working on, and with, this topic. The journal will accept papers on foundational aspects in dealing with big data, as well as papers on specific Platforms and Technologies used to deal with big data. To promote Data Science and interdisciplinary collaboration between fields, and to showcase the benefits of data driven research, papers demonstrating applications of big data in domains as diverse as Geoscience, Social Web, Finance, e-Commerce, Health Care, Environment and Climate, Physics and Astronomy, Chemistry, life sciences and drug discovery, digital libraries and scientific publications, security and government will also be considered. Occasionally the journal may publish whitepapers on policies, standards and best practices.