Applying a Context-based Method to Build a Knowledge Graph for the Blue Amazon

Data Intelligence Pub Date : 2024-03-11 DOI:10.1162/dint_a_00223

P. D. M. Ligabue, A. Brandão, S. M. Peres, F. G. Cozman, Paulo Pirozelli

{"title":"Applying a Context-based Method to Build a Knowledge Graph for the Blue Amazon","authors":"P. D. M. Ligabue, A. Brandão, S. M. Peres, F. G. Cozman, Paulo Pirozelli","doi":"10.1162/dint_a_00223","DOIUrl":null,"url":null,"abstract":"\n Knowledge graphs are employed in several tasks, such as question answering and recommendation systems, due to their ability to represent relationships between concepts. Automatically constructing such a graphs, however, remains an unresolved challenge within knowledge representation. To tackle this challenge, we propose CtxKG, a method specifically aimed at extracting knowledge graphs in a context of limited resources in which the only input is a set of unstructured text documents. CtxKG is based on OpenIE (a relationship triple extraction method) and BERT (a language model) and contains four stages: the extraction of relationship triples directly from text; the identification of synonyms across triples; the merging of similar entities; and the building of bridges between knowledge graphs of different documents. Our method distinguishes itself from those in the current literature (i) through its use of the parse tree to avoid the overlapping entities produced by base implementations of OpenIE; and (ii) through its bridges, which create a connected network of graphs, overcoming a limitation similar methods have of one isolated graph per document. We compare our method to two others by generating graphs for movie articles from Wikipedia and contrasting them with benchmark graphs built from the OMDb movie database. Our results suggest that our method is able to improve multiple aspects of knowledge graph construction. They also highlight the critical role that triple identification and named-entity recognition have in improving the quality of automatically generated graphs, suggesting future paths for investigation. Finally, we apply CtxKG to build BlabKG, a knowledge graph for the Blue Amazon, and discuss possible improvements.","PeriodicalId":57117,"journal":{"name":"Data Intelligence","volume":"101 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Intelligence","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.1162/dint_a_00223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Knowledge graphs are employed in several tasks, such as question answering and recommendation systems, due to their ability to represent relationships between concepts. Automatically constructing such a graphs, however, remains an unresolved challenge within knowledge representation. To tackle this challenge, we propose CtxKG, a method specifically aimed at extracting knowledge graphs in a context of limited resources in which the only input is a set of unstructured text documents. CtxKG is based on OpenIE (a relationship triple extraction method) and BERT (a language model) and contains four stages: the extraction of relationship triples directly from text; the identification of synonyms across triples; the merging of similar entities; and the building of bridges between knowledge graphs of different documents. Our method distinguishes itself from those in the current literature (i) through its use of the parse tree to avoid the overlapping entities produced by base implementations of OpenIE; and (ii) through its bridges, which create a connected network of graphs, overcoming a limitation similar methods have of one isolated graph per document. We compare our method to two others by generating graphs for movie articles from Wikipedia and contrasting them with benchmark graphs built from the OMDb movie database. Our results suggest that our method is able to improve multiple aspects of knowledge graph construction. They also highlight the critical role that triple identification and named-entity recognition have in improving the quality of automatically generated graphs, suggesting future paths for investigation. Finally, we apply CtxKG to build BlabKG, a knowledge graph for the Blue Amazon, and discuss possible improvements.

查看原文本刊更多论文

应用基于上下文的方法为蓝色亚马逊构建知识图谱

由于知识图谱能够表示概念之间的关系，因此被用于问题解答和推荐系统等多项任务中。然而，自动构建这样的图仍然是知识表征领域尚未解决的难题。为了应对这一挑战，我们提出了 CtxKG，这是一种专门用于在资源有限的情况下提取知识图谱的方法，其唯一的输入是一组非结构化文本文档。CtxKG 基于 OpenIE（一种关系三元提取方法）和 BERT（一种语言模型），包含四个阶段：直接从文本中提取关系三元；识别三元间的同义词；合并相似实体；以及在不同文档的知识图谱之间建立桥梁。我们的方法有别于现有文献中的方法：(i) 通过使用解析树来避免 OpenIE 基本实现所产生的重叠实体；(ii) 通过桥接来创建图的连接网络，克服了类似方法中每个文档只有一个孤立图的局限性。我们将维基百科中的电影文章生成图，并将其与 OMDb 电影数据库中的基准图进行对比，以此将我们的方法与其他两种方法进行比较。结果表明，我们的方法能够改进知识图谱构建的多个方面。这些结果还强调了三重识别和命名实体识别在提高自动生成图质量方面的关键作用，并提出了未来的研究方向。最后，我们应用 CtxKG 构建了蓝色亚马逊的知识图谱 BlabKG，并讨论了可能的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data Intelligence

CiteScore

6.60

自引率

0.00%

发文量