{"title":"Context Matrix Methods for Property and Structure Ontology Completion in Wikidata","authors":"J. A. Gómez, Thomas Hartka, B. Liang, Gavin Wiehl","doi":"10.1109/SIEDS52267.2021.9483776","DOIUrl":null,"url":null,"abstract":"Wikidata is a crowd-sourced knowledge base built by the creators of Wikipedia that applies the principles of neutrality and verifiability to data. In its more than eight years of existence, it has grown enormously, although disproportionately. Some areas are well curated and maintained, while many parts of the knowledge base are incomplete or use inconsistent classifications. Therefore, tools are needed that can use the instantiated data to infer and report structural gaps and suggest ways to address these gaps. We propose a context matrix to automatically suggest potential values for properties. This method can be extended to evaluating the ontology represented by knowledge base. In particular, it could be used to propose types and classes, supporting the discovery of ontological relationships that lend conceptual identification to the content entities. To work with the large, unlabelled data set, we first employ a pipeline to shrink the data to a minimal representation without information loss. We then process the data to build a recommendation model using property frequencies. We explore the results of these models in the context of suggesting type classifications in Wikidata and discuss potential extended applications. As a result of this work, we demonstrate approaches to contextualizing recently-added content in the knowledge base as well as proposing new connections for existing content. Finally, these methods could be applied to other knowledge graphs to develop similar completions for the entities contained therein.","PeriodicalId":426747,"journal":{"name":"2021 Systems and Information Engineering Design Symposium (SIEDS)","volume":"169 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS52267.2021.9483776","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Wikidata is a crowd-sourced knowledge base built by the creators of Wikipedia that applies the principles of neutrality and verifiability to data. In its more than eight years of existence, it has grown enormously, although disproportionately. Some areas are well curated and maintained, while many parts of the knowledge base are incomplete or use inconsistent classifications. Therefore, tools are needed that can use the instantiated data to infer and report structural gaps and suggest ways to address these gaps. We propose a context matrix to automatically suggest potential values for properties. This method can be extended to evaluating the ontology represented by knowledge base. In particular, it could be used to propose types and classes, supporting the discovery of ontological relationships that lend conceptual identification to the content entities. To work with the large, unlabelled data set, we first employ a pipeline to shrink the data to a minimal representation without information loss. We then process the data to build a recommendation model using property frequencies. We explore the results of these models in the context of suggesting type classifications in Wikidata and discuss potential extended applications. As a result of this work, we demonstrate approaches to contextualizing recently-added content in the knowledge base as well as proposing new connections for existing content. Finally, these methods could be applied to other knowledge graphs to develop similar completions for the entities contained therein.