Wiki2Prop:从维基百科预测维基数据属性的多模式方法

Michael Luggen, J. Audiffren, D. Difallah, P. Cudré-Mauroux
{"title":"Wiki2Prop:从维基百科预测维基数据属性的多模式方法","authors":"Michael Luggen, J. Audiffren, D. Difallah, P. Cudré-Mauroux","doi":"10.1145/3442381.3450082","DOIUrl":null,"url":null,"abstract":"Wikidata is rapidly emerging as a key resource for a multitude of online tasks such as Speech Recognition, Entity Linking, Question Answering, or Semantic Search. The value of Wikidata is directly linked to the rich information associated with each entity – that is, the properties describing each entity as well as the relationships to other entities. Despite the tremendous manual and automatic efforts the community invested in the Wikidata project, the growing number of entities (now more than 100 million) presents multiple challenges in terms of knowledge gaps in the graph that are hard to track. To help guide the community in filling the gaps in Wikidata, we propose to identify and rank the properties that an entity might be missing. In this work, we focus on entities with a dedicated Wikipedia page in any language to make predictions directly based on textual content. We show that this problem can be formulated as a multi-label classification problem where every property defined in Wikidata is a potential label. Our main contribution, Wiki2Prop, solves this problem using a multimodal Deep Learning method to predict which properties should be attached to a given entity, using its Wikipedia page embeddings. Moreover, Wiki2Prop is able to incorporate additional features in the form of multilingual embeddings and multimodal data such as images whenever available. We empirically evaluate our approach against the state of the art and show how Wiki2Prop significantly outperforms its competitors for the task of property prediction in Wikidata, and how the use of multilingual and multimodal data improves the results further. Finally, we make Wiki2Prop available as a property recommender system that can be activated and used directly in the context of a Wikidata entity page.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Wiki2Prop: A Multimodal Approach for Predicting Wikidata Properties from Wikipedia\",\"authors\":\"Michael Luggen, J. Audiffren, D. Difallah, P. Cudré-Mauroux\",\"doi\":\"10.1145/3442381.3450082\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Wikidata is rapidly emerging as a key resource for a multitude of online tasks such as Speech Recognition, Entity Linking, Question Answering, or Semantic Search. The value of Wikidata is directly linked to the rich information associated with each entity – that is, the properties describing each entity as well as the relationships to other entities. Despite the tremendous manual and automatic efforts the community invested in the Wikidata project, the growing number of entities (now more than 100 million) presents multiple challenges in terms of knowledge gaps in the graph that are hard to track. To help guide the community in filling the gaps in Wikidata, we propose to identify and rank the properties that an entity might be missing. In this work, we focus on entities with a dedicated Wikipedia page in any language to make predictions directly based on textual content. We show that this problem can be formulated as a multi-label classification problem where every property defined in Wikidata is a potential label. Our main contribution, Wiki2Prop, solves this problem using a multimodal Deep Learning method to predict which properties should be attached to a given entity, using its Wikipedia page embeddings. Moreover, Wiki2Prop is able to incorporate additional features in the form of multilingual embeddings and multimodal data such as images whenever available. We empirically evaluate our approach against the state of the art and show how Wiki2Prop significantly outperforms its competitors for the task of property prediction in Wikidata, and how the use of multilingual and multimodal data improves the results further. Finally, we make Wiki2Prop available as a property recommender system that can be activated and used directly in the context of a Wikidata entity page.\",\"PeriodicalId\":106672,\"journal\":{\"name\":\"Proceedings of the Web Conference 2021\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Web Conference 2021\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3442381.3450082\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Web Conference 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3442381.3450082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

摘要

维基数据正迅速成为众多在线任务的关键资源,如语音识别、实体链接、问题回答或语义搜索。Wikidata的值直接链接到与每个实体相关联的丰富信息——也就是说,描述每个实体的属性以及与其他实体的关系。尽管社区在Wikidata项目上投入了巨大的人工和自动努力,但不断增长的实体数量(现在超过1亿)在图中的知识差距方面提出了多重挑战,这些挑战很难追踪。为了帮助指导社区填补维基数据中的空白,我们建议对实体可能缺失的属性进行识别和排序。在这项工作中,我们专注于具有任何语言的专用维基百科页面的实体,以直接基于文本内容进行预测。我们证明这个问题可以被表述为一个多标签分类问题,其中在维基数据中定义的每个属性都是一个潜在的标签。我们的主要贡献是Wiki2Prop,它使用多模态深度学习方法来预测应该将哪些属性附加到给定实体上,使用它的维基百科页面嵌入。此外,Wiki2Prop能够在可用的情况下以多语言嵌入和多模式数据(如图像)的形式合并其他功能。我们根据最先进的技术对我们的方法进行了实证评估,并展示了Wiki2Prop如何在维基数据的属性预测任务中显著优于其竞争对手,以及如何使用多语言和多模态数据进一步改进结果。最后,我们使Wiki2Prop成为一个属性推荐系统,可以在维基数据实体页面的上下文中直接激活和使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Wiki2Prop: A Multimodal Approach for Predicting Wikidata Properties from Wikipedia
Wikidata is rapidly emerging as a key resource for a multitude of online tasks such as Speech Recognition, Entity Linking, Question Answering, or Semantic Search. The value of Wikidata is directly linked to the rich information associated with each entity – that is, the properties describing each entity as well as the relationships to other entities. Despite the tremendous manual and automatic efforts the community invested in the Wikidata project, the growing number of entities (now more than 100 million) presents multiple challenges in terms of knowledge gaps in the graph that are hard to track. To help guide the community in filling the gaps in Wikidata, we propose to identify and rank the properties that an entity might be missing. In this work, we focus on entities with a dedicated Wikipedia page in any language to make predictions directly based on textual content. We show that this problem can be formulated as a multi-label classification problem where every property defined in Wikidata is a potential label. Our main contribution, Wiki2Prop, solves this problem using a multimodal Deep Learning method to predict which properties should be attached to a given entity, using its Wikipedia page embeddings. Moreover, Wiki2Prop is able to incorporate additional features in the form of multilingual embeddings and multimodal data such as images whenever available. We empirically evaluate our approach against the state of the art and show how Wiki2Prop significantly outperforms its competitors for the task of property prediction in Wikidata, and how the use of multilingual and multimodal data improves the results further. Finally, we make Wiki2Prop available as a property recommender system that can be activated and used directly in the context of a Wikidata entity page.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信