{"title":"基于 PU 学习和提示学习的简单知识图谱完成模型","authors":"","doi":"10.1007/s10115-023-02040-z","DOIUrl":null,"url":null,"abstract":"<h3>Abstract</h3> <p>Knowledge graphs (KGs) are important resources for many artificial intelligence tasks but usually suffer from incompleteness, which has prompted scholars to put forward the task of knowledge graph completion (KGC). Embedding-based methods, which use the structural information of the KG for inference completion, are mainstream for this task. But these methods cannot complete the inference for the entities that do not appear in the KG and are also constrained by the structural information. To address these issues, scholars have proposed text-based methods. This type of method improves the reasoning ability of the model by utilizing pre-trained language (PLMs) models to learn textual information from the knowledge graph data. However, the performance of text-based methods lags behind that of embedding-based methods. We identify that the key reason lies in the expensive negative sampling. Positive unlabeled (PU) learning is introduced to help collect negative samples with high confidence from a small number of samples, and prompt learning is introduced to produce good training results. The proposed PLM-based KGC model outperforms earlier text-based methods and rivals earlier embedding-based approaches on several benchmark datasets. By exploiting the structural information of KGs, the proposed model also has a satisfactory performance in inference speed.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"30 1","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Simple knowledge graph completion model based on PU learning and prompt learning\",\"authors\":\"\",\"doi\":\"10.1007/s10115-023-02040-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<h3>Abstract</h3> <p>Knowledge graphs (KGs) are important resources for many artificial intelligence tasks but usually suffer from incompleteness, which has prompted scholars to put forward the task of knowledge graph completion (KGC). Embedding-based methods, which use the structural information of the KG for inference completion, are mainstream for this task. But these methods cannot complete the inference for the entities that do not appear in the KG and are also constrained by the structural information. To address these issues, scholars have proposed text-based methods. This type of method improves the reasoning ability of the model by utilizing pre-trained language (PLMs) models to learn textual information from the knowledge graph data. However, the performance of text-based methods lags behind that of embedding-based methods. We identify that the key reason lies in the expensive negative sampling. Positive unlabeled (PU) learning is introduced to help collect negative samples with high confidence from a small number of samples, and prompt learning is introduced to produce good training results. The proposed PLM-based KGC model outperforms earlier text-based methods and rivals earlier embedding-based approaches on several benchmark datasets. By exploiting the structural information of KGs, the proposed model also has a satisfactory performance in inference speed.</p>\",\"PeriodicalId\":54749,\"journal\":{\"name\":\"Knowledge and Information Systems\",\"volume\":\"30 1\",\"pages\":\"\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2024-01-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge and Information Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10115-023-02040-z\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge and Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10115-023-02040-z","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
摘要 知识图谱(KG)是许多人工智能任务的重要资源,但通常存在不完备性,这促使学者们提出了知识图谱完备(KGC)任务。基于嵌入的方法利用知识图谱的结构信息来完成推理,是该任务的主流方法。但这些方法无法完成未在知识图谱中出现的实体的推理,而且还受到结构信息的限制。针对这些问题,学者们提出了基于文本的方法。这类方法利用预先训练好的语言(PLMs)模型从知识图谱数据中学习文本信息,从而提高了模型的推理能力。然而,基于文本的方法的性能落后于基于嵌入的方法。我们发现,关键原因在于昂贵的负采样。我们引入了正向无标记(PU)学习,以帮助从少量样本中收集高置信度的负样本,并引入了及时学习,以产生良好的训练效果。所提出的基于 PLM 的 KGC 模型在多个基准数据集上的表现优于早期的基于文本的方法,也可与早期的基于嵌入的方法相媲美。通过利用 KG 的结构信息,所提出的模型在推理速度方面也有令人满意的表现。
Simple knowledge graph completion model based on PU learning and prompt learning
Abstract
Knowledge graphs (KGs) are important resources for many artificial intelligence tasks but usually suffer from incompleteness, which has prompted scholars to put forward the task of knowledge graph completion (KGC). Embedding-based methods, which use the structural information of the KG for inference completion, are mainstream for this task. But these methods cannot complete the inference for the entities that do not appear in the KG and are also constrained by the structural information. To address these issues, scholars have proposed text-based methods. This type of method improves the reasoning ability of the model by utilizing pre-trained language (PLMs) models to learn textual information from the knowledge graph data. However, the performance of text-based methods lags behind that of embedding-based methods. We identify that the key reason lies in the expensive negative sampling. Positive unlabeled (PU) learning is introduced to help collect negative samples with high confidence from a small number of samples, and prompt learning is introduced to produce good training results. The proposed PLM-based KGC model outperforms earlier text-based methods and rivals earlier embedding-based approaches on several benchmark datasets. By exploiting the structural information of KGs, the proposed model also has a satisfactory performance in inference speed.
期刊介绍:
Knowledge and Information Systems (KAIS) provides an international forum for researchers and professionals to share their knowledge and report new advances on all topics related to knowledge systems and advanced information systems. This monthly peer-reviewed archival journal publishes state-of-the-art research reports on emerging topics in KAIS, reviews of important techniques in related areas, and application papers of interest to a general readership.