Linking Entities from Text to Hundreds of RDF Datasets for Enabling Large Scale Entity Enrichment

Science of aging knowledge environment : SAGE KE Pub Date : 2021-12-24 DOI:10.3390/knowledge2010001

M. Mountantonakis, Yannis Tzitzikas

{"title":"Linking Entities from Text to Hundreds of RDF Datasets for Enabling Large Scale Entity Enrichment","authors":"M. Mountantonakis, Yannis Tzitzikas","doi":"10.3390/knowledge2010001","DOIUrl":null,"url":null,"abstract":"There is a high increase in approaches that receive as input a text and perform named entity recognition (or extraction) for linking the recognized entities of the given text to RDF Knowledge Bases (or datasets). In this way, it is feasible to retrieve more information for these entities, which can be of primary importance for several tasks, e.g., for facilitating manual annotation, hyperlink creation, content enrichment, for improving data veracity and others. However, current approaches link the extracted entities to one or few knowledge bases, therefore, it is not feasible to retrieve the URIs and facts of each recognized entity from multiple datasets and to discover the most relevant datasets for one or more extracted entities. For enabling this functionality, we introduce a research prototype, called LODsyndesisIE, which exploits three widely used Named Entity Recognition and Disambiguation tools (i.e., DBpedia Spotlight, WAT and Stanford CoreNLP) for recognizing the entities of a given text. Afterwards, it links these entities to the LODsyndesis knowledge base, which offers data enrichment and discovery services for millions of entities over hundreds of RDF datasets. We introduce all the steps of LODsyndesisIE, and we provide information on how to exploit its services through its online application and its REST API. Concerning the evaluation, we use three evaluation collections of texts: (i) for comparing the effectiveness of combining different Named Entity Recognition tools, (ii) for measuring the gain in terms of enrichment by linking the extracted entities to LODsyndesis instead of using a single or a few RDF datasets and (iii) for evaluating the efficiency of LODsyndesisIE.","PeriodicalId":74770,"journal":{"name":"Science of aging knowledge environment : SAGE KE","volume":"66 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science of aging knowledge environment : SAGE KE","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/knowledge2010001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

There is a high increase in approaches that receive as input a text and perform named entity recognition (or extraction) for linking the recognized entities of the given text to RDF Knowledge Bases (or datasets). In this way, it is feasible to retrieve more information for these entities, which can be of primary importance for several tasks, e.g., for facilitating manual annotation, hyperlink creation, content enrichment, for improving data veracity and others. However, current approaches link the extracted entities to one or few knowledge bases, therefore, it is not feasible to retrieve the URIs and facts of each recognized entity from multiple datasets and to discover the most relevant datasets for one or more extracted entities. For enabling this functionality, we introduce a research prototype, called LODsyndesisIE, which exploits three widely used Named Entity Recognition and Disambiguation tools (i.e., DBpedia Spotlight, WAT and Stanford CoreNLP) for recognizing the entities of a given text. Afterwards, it links these entities to the LODsyndesis knowledge base, which offers data enrichment and discovery services for millions of entities over hundreds of RDF datasets. We introduce all the steps of LODsyndesisIE, and we provide information on how to exploit its services through its online application and its REST API. Concerning the evaluation, we use three evaluation collections of texts: (i) for comparing the effectiveness of combining different Named Entity Recognition tools, (ii) for measuring the gain in terms of enrichment by linking the extracted entities to LODsyndesis instead of using a single or a few RDF datasets and (iii) for evaluating the efficiency of LODsyndesisIE.

查看原文本刊更多论文

将实体从文本链接到数百个RDF数据集，以实现大规模实体充实

将文本作为输入接收并执行命名实体识别(或提取)以将给定文本的已识别实体链接到RDF知识库(或数据集)的方法大量增加。通过这种方式，为这些实体检索更多信息是可行的，这些信息对于一些任务至关重要，例如，促进手工注释、创建超链接、丰富内容、提高数据准确性等。然而，目前的方法将提取的实体链接到一个或几个知识库，因此，从多个数据集中检索每个识别实体的uri和事实以及为一个或多个提取实体发现最相关的数据集是不可行的。为了实现这一功能，我们引入了一个名为LODsyndesisIE的研究原型，它利用了三种广泛使用的命名实体识别和消歧工具(即DBpedia Spotlight、WAT和Stanford CoreNLP)来识别给定文本的实体。然后，它将这些实体链接到LODsyndesis知识库，该知识库为数百个RDF数据集上的数百万个实体提供数据丰富和发现服务。我们将介绍LODsyndesisIE的所有步骤，并提供有关如何通过其在线应用程序和REST API利用其服务的信息。关于评估，我们使用了三个文本评估集合:(i)比较组合不同命名实体识别工具的有效性，(ii)通过将提取的实体链接到LODsyndesis而不是使用单个或几个RDF数据集来衡量增益，以及(iii)评估LODsyndesisIE的效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Science of aging knowledge environment : SAGE KE

自引率

0.00%

发文量