Recoin:维基数据的相对完整性

Companion Proceedings of the The Web Conference 2018 Pub Date : 2018-04-23 DOI:10.1145/3184558.3191641

Vevake Balaraman, Simon Razniewski, Werner Nutt

{"title":"Recoin:维基数据的相对完整性","authors":"Vevake Balaraman, Simon Razniewski, Werner Nutt","doi":"10.1145/3184558.3191641","DOIUrl":null,"url":null,"abstract":"The collaborative knowledge base Wikidata is the central storage of Wikimedia projects, containing over 45 million data items. It acts as the hub for interlinking Wikipedia pages about a specific item in different languages, automates features such as infoboxes in Wikipedia, and is increasingly used for other applications such as data enrichment and question answering. Tracking the quality of Wikidata is an important issue for this project. In this paper we focus particularly on the completeness aspect. Several automated techniques have been adopted by Wikis to track and manage completeness, yet these techniques are generally subjective and do not provide a clear quality estimate at the level of entities. In this paper, we present an approach towards measuring Relative Completeness in Wikidata by comparison with data present for similar entities. This relative completeness approach is easily scalable with the introduction of new classes in the knowledge base, and has been implemented for all available entities in Wikidata. The results provide an intuition on the completeness of an entity comparing it with other similar entities. Here, we present our implementation approach along with a discussion on strategies and open challenges.","PeriodicalId":235572,"journal":{"name":"Companion Proceedings of the The Web Conference 2018","volume":"2017 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"57","resultStr":"{\"title\":\"Recoin: Relative Completeness in Wikidata\",\"authors\":\"Vevake Balaraman, Simon Razniewski, Werner Nutt\",\"doi\":\"10.1145/3184558.3191641\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The collaborative knowledge base Wikidata is the central storage of Wikimedia projects, containing over 45 million data items. It acts as the hub for interlinking Wikipedia pages about a specific item in different languages, automates features such as infoboxes in Wikipedia, and is increasingly used for other applications such as data enrichment and question answering. Tracking the quality of Wikidata is an important issue for this project. In this paper we focus particularly on the completeness aspect. Several automated techniques have been adopted by Wikis to track and manage completeness, yet these techniques are generally subjective and do not provide a clear quality estimate at the level of entities. In this paper, we present an approach towards measuring Relative Completeness in Wikidata by comparison with data present for similar entities. This relative completeness approach is easily scalable with the introduction of new classes in the knowledge base, and has been implemented for all available entities in Wikidata. The results provide an intuition on the completeness of an entity comparing it with other similar entities. Here, we present our implementation approach along with a discussion on strategies and open challenges.\",\"PeriodicalId\":235572,\"journal\":{\"name\":\"Companion Proceedings of the The Web Conference 2018\",\"volume\":\"2017 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"57\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Companion Proceedings of the The Web Conference 2018\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3184558.3191641\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Proceedings of the The Web Conference 2018","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3184558.3191641","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 57

摘要

协作知识库Wikidata是维基媒体项目的中央存储，包含超过4500万个数据项。它的作用是将不同语言的特定条目的维基百科页面相互链接，使维基百科中的信息框等功能自动化，并且越来越多地用于数据充实和问答等其他应用程序。跟踪维基数据的质量是这个项目的一个重要问题。在本文中，我们特别关注完备性方面。wiki采用了几种自动化技术来跟踪和管理完整性，但是这些技术通常是主观的，不能在实体级别上提供清晰的质量评估。在本文中，我们提出了一种通过与类似实体的数据进行比较来衡量维基数据相对完备性的方法。通过在知识库中引入新类，这种相对完备的方法很容易扩展，并且已经在Wikidata中为所有可用实体实现。结果提供了对实体与其他类似实体进行比较的完整性的直觉。在这里，我们提出了我们的实施方法，并讨论了战略和开放的挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Recoin: Relative Completeness in Wikidata

The collaborative knowledge base Wikidata is the central storage of Wikimedia projects, containing over 45 million data items. It acts as the hub for interlinking Wikipedia pages about a specific item in different languages, automates features such as infoboxes in Wikipedia, and is increasingly used for other applications such as data enrichment and question answering. Tracking the quality of Wikidata is an important issue for this project. In this paper we focus particularly on the completeness aspect. Several automated techniques have been adopted by Wikis to track and manage completeness, yet these techniques are generally subjective and do not provide a clear quality estimate at the level of entities. In this paper, we present an approach towards measuring Relative Completeness in Wikidata by comparison with data present for similar entities. This relative completeness approach is easily scalable with the introduction of new classes in the knowledge base, and has been implemented for all available entities in Wikidata. The results provide an intuition on the completeness of an entity comparing it with other similar entities. Here, we present our implementation approach along with a discussion on strategies and open challenges.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Companion Proceedings of the The Web Conference 2018

自引率

0.00%

发文量