SF-GPT: A training-free method to enhance capabilities for knowledge graph construction in LLMs

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2024-10-21 DOI:10.1016/j.neucom.2024.128726

Lizhuang Sun, Peng Zhang, Fang Gao, Yuan An, Zhixing Li, Yuanwei Zhao

{"title":"SF-GPT: A training-free method to enhance capabilities for knowledge graph construction in LLMs","authors":"Lizhuang Sun, Peng Zhang, Fang Gao, Yuan An, Zhixing Li, Yuanwei Zhao","doi":"10.1016/j.neucom.2024.128726","DOIUrl":null,"url":null,"abstract":"<div><div>Knowledge graphs (KGs) are constructed by extracting knowledge triples from text and fusing knowledge, enhancing information retrieval efficiency. Current methods for knowledge triple extraction include ”Pretrain and Fine-tuning” and Large Language Models (LLMs). The former shifts effort from manual extraction to dataset annotation and suffers from performance degradation with different test and training set distributions. LLMs-based methods face errors and incompleteness in extraction. We introduce SF-GPT, a training-free method to address these issues. Firstly, we propose the Entity Extraction Filter (EEF) module to filter triple generation results, addressing evaluation and cleansing challenges. Secondly, we introduce a training-free Entity Alignment Module based on Entity Alias Generation (EAG), tackling semantic richness and interpretability issues in LLM-based knowledge fusion. Finally, our Self-Fusion Subgraph strategy uses multi-response self-fusion and a common entity list to filter triple results, reducing noise from LLMs’ multi-responses. In experiments, SF-GPT showed a 55.5% increase in recall and a 32.6% increase in F1 score on the BDNC dataset compared to the UniRel model trained on the NYT dataset and achieved a 5% improvement in F1 score compared to GPT-4+EEF baseline on the WebNLG dataset in the case of a fusion round of three. SF-GPT offers a promising way to extract knowledge from unstructured information.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"613 ","pages":"Article 128726"},"PeriodicalIF":5.5000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224014978","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Knowledge graphs (KGs) are constructed by extracting knowledge triples from text and fusing knowledge, enhancing information retrieval efficiency. Current methods for knowledge triple extraction include ”Pretrain and Fine-tuning” and Large Language Models (LLMs). The former shifts effort from manual extraction to dataset annotation and suffers from performance degradation with different test and training set distributions. LLMs-based methods face errors and incompleteness in extraction. We introduce SF-GPT, a training-free method to address these issues. Firstly, we propose the Entity Extraction Filter (EEF) module to filter triple generation results, addressing evaluation and cleansing challenges. Secondly, we introduce a training-free Entity Alignment Module based on Entity Alias Generation (EAG), tackling semantic richness and interpretability issues in LLM-based knowledge fusion. Finally, our Self-Fusion Subgraph strategy uses multi-response self-fusion and a common entity list to filter triple results, reducing noise from LLMs’ multi-responses. In experiments, SF-GPT showed a 55.5% increase in recall and a 32.6% increase in F1 score on the BDNC dataset compared to the UniRel model trained on the NYT dataset and achieved a 5% improvement in F1 score compared to GPT-4+EEF baseline on the WebNLG dataset in the case of a fusion round of three. SF-GPT offers a promising way to extract knowledge from unstructured information.

查看原文本刊更多论文

SF-GPT：一种无需训练的方法，可提高 LLM 中知识图谱构建的能力

知识图谱（KG）是通过从文本中提取知识三元组并融合知识来构建的，可提高信息检索效率。目前的知识三元提取方法包括 "预训练和微调"（Pretrain and Fine-tuning）和大型语言模型（LLMs）。前者将精力从人工提取转移到数据集标注上，但在测试集和训练集分布不同的情况下性能下降。基于 LLMs 的方法在提取过程中面临错误和不完整性。我们引入了 SF-GPT，一种无需训练的方法来解决这些问题。首先，我们提出了实体提取过滤器（EEF）模块，用于过滤三重生成结果，解决评估和清洗难题。其次，我们引入了基于实体别名生成（EAG）的免训练实体对齐模块，以解决基于 LLM 的知识融合中的语义丰富性和可解释性问题。最后，我们的自融合子图（Self-Fusion Subgraph）策略使用多响应自融合和通用实体列表来过滤三重结果，从而减少来自 LLM 多响应的噪声。在实验中，与在 NYT 数据集上训练的 UniRel 模型相比，SF-GPT 在 BDNC 数据集上的召回率提高了 55.5%，F1 分数提高了 32.6%；与 WebNLG 数据集上的 GPT-4+EEF 基线相比，在三轮融合的情况下，SF-GPT 的 F1 分数提高了 5%。SF-GPT 为从非结构化信息中提取知识提供了一种很有前景的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.