语义网上实体链接的自举方法

IF 2.1 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Wei Hu, Cunxin Jia
{"title":"语义网上实体链接的自举方法","authors":"Wei Hu, Cunxin Jia","doi":"10.2139/ssrn.3199193","DOIUrl":null,"url":null,"abstract":"In the Big Data era, ever-increasing RDF data have reached a scale in billions of entities and brought challenges to the problem of entity linkage on the Semantic Web. Although millions of entities, typically denoted by URIs, have been explicitly linked with owl:sameAs, potentially coreferent ones are still numerous. Existing automatic approaches address this problem mainly from two perspectives: one is via equivalence reasoning, which infers semantically coreferent entities but probably misses many potentials; the other is by similarity computation between property-values of entities, which is not always accurate and do not scale well. In this paper, we introduce a bootstrapping approach by leveraging these two kinds of methods for entity linkage. Given an entity, our approach first infers a set of semantically coreferent entities. Then, it iteratively expands this entity set using discriminative property-value pairs. The discriminability is learned with a statistical measure, which does not only identify important property-values in the entity set, but also takes matched properties into account. Frequent property combinations are also mined to improve linkage accuracy. We develop an online entity linkage search engine, and show its superior precision and recall by comparing with representative approaches on a large-scale and two benchmark datasets.","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"30 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"A Bootstrapping Approach to Entity Linkage on the Semantic Web\",\"authors\":\"Wei Hu, Cunxin Jia\",\"doi\":\"10.2139/ssrn.3199193\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the Big Data era, ever-increasing RDF data have reached a scale in billions of entities and brought challenges to the problem of entity linkage on the Semantic Web. Although millions of entities, typically denoted by URIs, have been explicitly linked with owl:sameAs, potentially coreferent ones are still numerous. Existing automatic approaches address this problem mainly from two perspectives: one is via equivalence reasoning, which infers semantically coreferent entities but probably misses many potentials; the other is by similarity computation between property-values of entities, which is not always accurate and do not scale well. In this paper, we introduce a bootstrapping approach by leveraging these two kinds of methods for entity linkage. Given an entity, our approach first infers a set of semantically coreferent entities. Then, it iteratively expands this entity set using discriminative property-value pairs. The discriminability is learned with a statistical measure, which does not only identify important property-values in the entity set, but also takes matched properties into account. Frequent property combinations are also mined to improve linkage accuracy. We develop an online entity linkage search engine, and show its superior precision and recall by comparing with representative approaches on a large-scale and two benchmark datasets.\",\"PeriodicalId\":49951,\"journal\":{\"name\":\"Journal of Web Semantics\",\"volume\":\"30 1\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2015-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Web Semantics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3199193\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Semantics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.2139/ssrn.3199193","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 18

摘要

在大数据时代,不断增长的RDF数据已达到数十亿实体的规模,对语义Web上的实体联动问题提出了挑战。尽管数以百万计的实体(通常由uri表示)已显式地与owl:sameAs链接,但潜在的共指实体仍然很多。现有的自动方法主要从两个方面来解决这个问题:一是通过等价推理,它可以推断出语义上共指的实体,但可能会错过许多潜力;另一种方法是通过实体属性值之间的相似性计算,这种方法并不总是准确的,而且伸缩性不好。在本文中,我们引入了一种利用这两种方法进行实体链接的自举方法。给定一个实体,我们的方法首先推断出一组语义上相互关联的实体。然后,它使用判别性属性值对迭代地扩展这个实体集。可判别性是通过统计度量来学习的,该度量不仅可以识别实体集中重要的属性值,还可以考虑匹配属性。频繁的属性组合也被挖掘以提高链接的准确性。我们开发了一个在线实体链接搜索引擎,并在大规模数据集和两个基准数据集上与代表性方法进行了比较,显示了其优越的查准率和查全率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Bootstrapping Approach to Entity Linkage on the Semantic Web
In the Big Data era, ever-increasing RDF data have reached a scale in billions of entities and brought challenges to the problem of entity linkage on the Semantic Web. Although millions of entities, typically denoted by URIs, have been explicitly linked with owl:sameAs, potentially coreferent ones are still numerous. Existing automatic approaches address this problem mainly from two perspectives: one is via equivalence reasoning, which infers semantically coreferent entities but probably misses many potentials; the other is by similarity computation between property-values of entities, which is not always accurate and do not scale well. In this paper, we introduce a bootstrapping approach by leveraging these two kinds of methods for entity linkage. Given an entity, our approach first infers a set of semantically coreferent entities. Then, it iteratively expands this entity set using discriminative property-value pairs. The discriminability is learned with a statistical measure, which does not only identify important property-values in the entity set, but also takes matched properties into account. Frequent property combinations are also mined to improve linkage accuracy. We develop an online entity linkage search engine, and show its superior precision and recall by comparing with representative approaches on a large-scale and two benchmark datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Web Semantics
Journal of Web Semantics 工程技术-计算机:人工智能
CiteScore
6.20
自引率
12.00%
发文量
22
审稿时长
14.6 weeks
期刊介绍: The Journal of Web Semantics is an interdisciplinary journal based on research and applications of various subject areas that contribute to the development of a knowledge-intensive and intelligent service Web. These areas include: knowledge technologies, ontology, agents, databases and the semantic grid, obviously disciplines like information retrieval, language technology, human-computer interaction and knowledge discovery are of major relevance as well. All aspects of the Semantic Web development are covered. The publication of large-scale experiments and their analysis is also encouraged to clearly illustrate scenarios and methods that introduce semantics into existing Web interfaces, contents and services. The journal emphasizes the publication of papers that combine theories, methods and experiments from different subject areas in order to deliver innovative semantic methods and applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信