{"title":"Graph Structure Enhanced Pre-Training Language Model for Knowledge Graph Completion","authors":"Huashi Zhu;Dexuan Xu;Yu Huang;Zhi Jin;Weiping Ding;Jiahui Tong;Guoshuang Chong","doi":"10.1109/TETCI.2024.3372442","DOIUrl":null,"url":null,"abstract":"A vast amount of textual and structural information is required for knowledge graph construction and its downstream tasks. However, most of the current knowledge graphs are incomplete due to the difficulty of knowledge acquisition and integration. Knowledge Graph Completion (KGC) is used to predict missing connections. In previous studies, textual information and graph structural information are utilized independently, without an effective method for fusing these two types of information. In this paper, we propose a graph structure enhanced pre-training language model for knowledge graph completion. Firstly, we design a graph sampling algorithm and a Graph2Seq module for constructing sub-graphs and their corresponding contexts to support large-scale knowledge graph learning and parallel training. It is also the basis for fusing textual data and graph structure. Next, two pre-training tasks based on masked modeling are designed for capturing accurate entity-level and relation-level information. Furthermore, this paper proposes a novel asymmetric Encoder-Decoder architecture to restore masked components, where the encoder is a Pre-trained Language Model (PLM) and the decoder is a multi-relational Graph Neural Network (GNN). The purpose of the architecture is to integrate textual information effectively with graph structural information. Finally, the model is fine-tuned for KGC tasks on two widely used public datasets. The experiments show that the model achieves excellent performance and outperforms baselines in most metrics, which demonstrate the effectiveness of our approach by fusing the structure and semantic information to knowledge graph.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"8 4","pages":"2697-2708"},"PeriodicalIF":5.3000,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10475356/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
A vast amount of textual and structural information is required for knowledge graph construction and its downstream tasks. However, most of the current knowledge graphs are incomplete due to the difficulty of knowledge acquisition and integration. Knowledge Graph Completion (KGC) is used to predict missing connections. In previous studies, textual information and graph structural information are utilized independently, without an effective method for fusing these two types of information. In this paper, we propose a graph structure enhanced pre-training language model for knowledge graph completion. Firstly, we design a graph sampling algorithm and a Graph2Seq module for constructing sub-graphs and their corresponding contexts to support large-scale knowledge graph learning and parallel training. It is also the basis for fusing textual data and graph structure. Next, two pre-training tasks based on masked modeling are designed for capturing accurate entity-level and relation-level information. Furthermore, this paper proposes a novel asymmetric Encoder-Decoder architecture to restore masked components, where the encoder is a Pre-trained Language Model (PLM) and the decoder is a multi-relational Graph Neural Network (GNN). The purpose of the architecture is to integrate textual information effectively with graph structural information. Finally, the model is fine-tuned for KGC tasks on two widely used public datasets. The experiments show that the model achieves excellent performance and outperforms baselines in most metrics, which demonstrate the effectiveness of our approach by fusing the structure and semantic information to knowledge graph.
期刊介绍:
The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys.
TETCI is an electronics only publication. TETCI publishes six issues per year.
Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.