SF-GPT: A training-free method to enhance capabilities for knowledge graph construction in LLMs

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Lizhuang Sun, Peng Zhang, Fang Gao, Yuan An, Zhixing Li, Yuanwei Zhao
{"title":"SF-GPT: A training-free method to enhance capabilities for knowledge graph construction in LLMs","authors":"Lizhuang Sun,&nbsp;Peng Zhang,&nbsp;Fang Gao,&nbsp;Yuan An,&nbsp;Zhixing Li,&nbsp;Yuanwei Zhao","doi":"10.1016/j.neucom.2024.128726","DOIUrl":null,"url":null,"abstract":"<div><div>Knowledge graphs (KGs) are constructed by extracting knowledge triples from text and fusing knowledge, enhancing information retrieval efficiency. Current methods for knowledge triple extraction include ”Pretrain and Fine-tuning” and Large Language Models (LLMs). The former shifts effort from manual extraction to dataset annotation and suffers from performance degradation with different test and training set distributions. LLMs-based methods face errors and incompleteness in extraction. We introduce SF-GPT, a training-free method to address these issues. Firstly, we propose the Entity Extraction Filter (EEF) module to filter triple generation results, addressing evaluation and cleansing challenges. Secondly, we introduce a training-free Entity Alignment Module based on Entity Alias Generation (EAG), tackling semantic richness and interpretability issues in LLM-based knowledge fusion. Finally, our Self-Fusion Subgraph strategy uses multi-response self-fusion and a common entity list to filter triple results, reducing noise from LLMs’ multi-responses. In experiments, SF-GPT showed a 55.5% increase in recall and a 32.6% increase in F1 score on the BDNC dataset compared to the UniRel model trained on the NYT dataset and achieved a 5% improvement in F1 score compared to GPT-4+EEF baseline on the WebNLG dataset in the case of a fusion round of three. SF-GPT offers a promising way to extract knowledge from unstructured information.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"613 ","pages":"Article 128726"},"PeriodicalIF":5.5000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224014978","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Knowledge graphs (KGs) are constructed by extracting knowledge triples from text and fusing knowledge, enhancing information retrieval efficiency. Current methods for knowledge triple extraction include ”Pretrain and Fine-tuning” and Large Language Models (LLMs). The former shifts effort from manual extraction to dataset annotation and suffers from performance degradation with different test and training set distributions. LLMs-based methods face errors and incompleteness in extraction. We introduce SF-GPT, a training-free method to address these issues. Firstly, we propose the Entity Extraction Filter (EEF) module to filter triple generation results, addressing evaluation and cleansing challenges. Secondly, we introduce a training-free Entity Alignment Module based on Entity Alias Generation (EAG), tackling semantic richness and interpretability issues in LLM-based knowledge fusion. Finally, our Self-Fusion Subgraph strategy uses multi-response self-fusion and a common entity list to filter triple results, reducing noise from LLMs’ multi-responses. In experiments, SF-GPT showed a 55.5% increase in recall and a 32.6% increase in F1 score on the BDNC dataset compared to the UniRel model trained on the NYT dataset and achieved a 5% improvement in F1 score compared to GPT-4+EEF baseline on the WebNLG dataset in the case of a fusion round of three. SF-GPT offers a promising way to extract knowledge from unstructured information.
SF-GPT:一种无需训练的方法,可提高 LLM 中知识图谱构建的能力
知识图谱(KG)是通过从文本中提取知识三元组并融合知识来构建的,可提高信息检索效率。目前的知识三元提取方法包括 "预训练和微调"(Pretrain and Fine-tuning)和大型语言模型(LLMs)。前者将精力从人工提取转移到数据集标注上,但在测试集和训练集分布不同的情况下性能下降。基于 LLMs 的方法在提取过程中面临错误和不完整性。我们引入了 SF-GPT,一种无需训练的方法来解决这些问题。首先,我们提出了实体提取过滤器(EEF)模块,用于过滤三重生成结果,解决评估和清洗难题。其次,我们引入了基于实体别名生成(EAG)的免训练实体对齐模块,以解决基于 LLM 的知识融合中的语义丰富性和可解释性问题。最后,我们的自融合子图(Self-Fusion Subgraph)策略使用多响应自融合和通用实体列表来过滤三重结果,从而减少来自 LLM 多响应的噪声。在实验中,与在 NYT 数据集上训练的 UniRel 模型相比,SF-GPT 在 BDNC 数据集上的召回率提高了 55.5%,F1 分数提高了 32.6%;与 WebNLG 数据集上的 GPT-4+EEF 基线相比,在三轮融合的情况下,SF-GPT 的 F1 分数提高了 5%。SF-GPT 为从非结构化信息中提取知识提供了一种很有前景的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信