Context Matrix Methods for Property and Structure Ontology Completion in Wikidata

2021 Systems and Information Engineering Design Symposium (SIEDS) Pub Date : 2021-04-30 DOI:10.1109/SIEDS52267.2021.9483776

J. A. Gómez, Thomas Hartka, B. Liang, Gavin Wiehl

{"title":"Context Matrix Methods for Property and Structure Ontology Completion in Wikidata","authors":"J. A. Gómez, Thomas Hartka, B. Liang, Gavin Wiehl","doi":"10.1109/SIEDS52267.2021.9483776","DOIUrl":null,"url":null,"abstract":"Wikidata is a crowd-sourced knowledge base built by the creators of Wikipedia that applies the principles of neutrality and verifiability to data. In its more than eight years of existence, it has grown enormously, although disproportionately. Some areas are well curated and maintained, while many parts of the knowledge base are incomplete or use inconsistent classifications. Therefore, tools are needed that can use the instantiated data to infer and report structural gaps and suggest ways to address these gaps. We propose a context matrix to automatically suggest potential values for properties. This method can be extended to evaluating the ontology represented by knowledge base. In particular, it could be used to propose types and classes, supporting the discovery of ontological relationships that lend conceptual identification to the content entities. To work with the large, unlabelled data set, we first employ a pipeline to shrink the data to a minimal representation without information loss. We then process the data to build a recommendation model using property frequencies. We explore the results of these models in the context of suggesting type classifications in Wikidata and discuss potential extended applications. As a result of this work, we demonstrate approaches to contextualizing recently-added content in the knowledge base as well as proposing new connections for existing content. Finally, these methods could be applied to other knowledge graphs to develop similar completions for the entities contained therein.","PeriodicalId":426747,"journal":{"name":"2021 Systems and Information Engineering Design Symposium (SIEDS)","volume":"169 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS52267.2021.9483776","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Wikidata is a crowd-sourced knowledge base built by the creators of Wikipedia that applies the principles of neutrality and verifiability to data. In its more than eight years of existence, it has grown enormously, although disproportionately. Some areas are well curated and maintained, while many parts of the knowledge base are incomplete or use inconsistent classifications. Therefore, tools are needed that can use the instantiated data to infer and report structural gaps and suggest ways to address these gaps. We propose a context matrix to automatically suggest potential values for properties. This method can be extended to evaluating the ontology represented by knowledge base. In particular, it could be used to propose types and classes, supporting the discovery of ontological relationships that lend conceptual identification to the content entities. To work with the large, unlabelled data set, we first employ a pipeline to shrink the data to a minimal representation without information loss. We then process the data to build a recommendation model using property frequencies. We explore the results of these models in the context of suggesting type classifications in Wikidata and discuss potential extended applications. As a result of this work, we demonstrate approaches to contextualizing recently-added content in the knowledge base as well as proposing new connections for existing content. Finally, these methods could be applied to other knowledge graphs to develop similar completions for the entities contained therein.

查看原文本刊更多论文

维基数据中属性和结构本体补全的上下文矩阵方法

维基数据是一个由维基百科的创建者建立的众包知识库，它将中立和可验证的原则应用于数据。在其成立的八年多时间里，它取得了巨大的发展，尽管这是不成比例的。有些领域得到了很好的管理和维护，而知识库的许多部分不完整或使用不一致的分类。因此，需要能够使用实例化数据来推断和报告结构性差距，并建议解决这些差距的方法的工具。我们提出了一个上下文矩阵来自动建议属性的潜在值。该方法可以推广到知识库表示的本体评价。特别是，它可以用来提出类型和类，支持发现本体关系，从而为内容实体提供概念识别。为了处理大的、未标记的数据集，我们首先使用管道将数据缩小到最小的表示，而不会丢失信息。然后，我们处理数据以使用属性频率构建推荐模型。我们在建议维基数据类型分类的背景下探索这些模型的结果，并讨论潜在的扩展应用。作为这项工作的结果，我们展示了将知识库中最近添加的内容语境化以及为现有内容提出新连接的方法。最后，这些方法可以应用于其他知识图，为其中包含的实体开发类似的补全。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 Systems and Information Engineering Design Symposium (SIEDS)

自引率

0.00%

发文量