避免中文耳语:控制链接开放数据存储的端到端连接质量

Jan-Christoph Kalo, S. Homoceanu, J. Rose, Wolf-Tilo Balke
{"title":"避免中文耳语:控制链接开放数据存储的端到端连接质量","authors":"Jan-Christoph Kalo, S. Homoceanu, J. Rose, Wolf-Tilo Balke","doi":"10.1145/2786451.2786466","DOIUrl":null,"url":null,"abstract":"Today Linked Open Data is a central trend in information provisioning. Data is collected in distributed data stores, individually curated with high quality, and made available over the Web for a wide variety of Web applications providing their own business logic for data utilization. Thus, the key promise of Linked Open Data is to provide a holistic view for a wide range of data items or entities. But parallel to the problems of database integration or schema matching, linking data over several sources remains a challenge and is currently severely hampering the vision of a working Semantic Web. One possible solution are instance matching systems that automatically create owl:sameAs links between data stores. According to existing benchmarks, the matching quality has even reached a satisfying level. However, our extensive analysis shows that instance matching systems are not yet ready for large-scale data interlinking. This is because query processors joining even via a single incorrectly created link implicitly use also all transitive owl:sameAs links that may in turn be mismatched again. The result is similar to the game Chinese Whispers: watered-down sameAs semantics step-by-step lead to a terrible end-to-end quality of joins. We develop innovative structural mechanisms on top of instance matching systems to significantly improve query processing avoiding Chinese Whispers.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"61 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Avoiding Chinese Whispers: Controlling End-to-End Join Quality in Linked Open Data Stores\",\"authors\":\"Jan-Christoph Kalo, S. Homoceanu, J. Rose, Wolf-Tilo Balke\",\"doi\":\"10.1145/2786451.2786466\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Today Linked Open Data is a central trend in information provisioning. Data is collected in distributed data stores, individually curated with high quality, and made available over the Web for a wide variety of Web applications providing their own business logic for data utilization. Thus, the key promise of Linked Open Data is to provide a holistic view for a wide range of data items or entities. But parallel to the problems of database integration or schema matching, linking data over several sources remains a challenge and is currently severely hampering the vision of a working Semantic Web. One possible solution are instance matching systems that automatically create owl:sameAs links between data stores. According to existing benchmarks, the matching quality has even reached a satisfying level. However, our extensive analysis shows that instance matching systems are not yet ready for large-scale data interlinking. This is because query processors joining even via a single incorrectly created link implicitly use also all transitive owl:sameAs links that may in turn be mismatched again. The result is similar to the game Chinese Whispers: watered-down sameAs semantics step-by-step lead to a terrible end-to-end quality of joins. We develop innovative structural mechanisms on top of instance matching systems to significantly improve query processing avoiding Chinese Whispers.\",\"PeriodicalId\":93136,\"journal\":{\"name\":\"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference\",\"volume\":\"61 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2786451.2786466\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2786451.2786466","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

今天,关联开放数据是信息供应的一个核心趋势。数据收集在分布式数据存储中,单独进行高质量的管理,并通过Web提供给各种Web应用程序使用,这些应用程序为数据利用提供了自己的业务逻辑。因此,关联开放数据的关键承诺是为广泛的数据项或实体提供一个整体视图。但是,与数据库集成或模式匹配的问题并行,链接多个数据源上的数据仍然是一个挑战,并且目前严重阻碍了工作语义Web的愿景。一个可能的解决方案是实例匹配系统,它可以自动在数据存储之间创建owl:sameAs链接。根据现有的基准,匹配质量甚至达到了令人满意的水平。然而,我们的广泛分析表明,实例匹配系统还没有为大规模数据互连做好准备。这是因为查询处理器即使通过一个错误创建的链接连接,也会隐式地使用所有可传递的owl:sameAs链接,而这些链接又可能再次不匹配。结果类似于游戏Chinese Whispers:逐步淡化的sameAs语义导致了糟糕的端到端连接质量。我们在实例匹配系统的基础上开发了创新的结构机制,以显著改善查询处理,避免中文耳语。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Avoiding Chinese Whispers: Controlling End-to-End Join Quality in Linked Open Data Stores
Today Linked Open Data is a central trend in information provisioning. Data is collected in distributed data stores, individually curated with high quality, and made available over the Web for a wide variety of Web applications providing their own business logic for data utilization. Thus, the key promise of Linked Open Data is to provide a holistic view for a wide range of data items or entities. But parallel to the problems of database integration or schema matching, linking data over several sources remains a challenge and is currently severely hampering the vision of a working Semantic Web. One possible solution are instance matching systems that automatically create owl:sameAs links between data stores. According to existing benchmarks, the matching quality has even reached a satisfying level. However, our extensive analysis shows that instance matching systems are not yet ready for large-scale data interlinking. This is because query processors joining even via a single incorrectly created link implicitly use also all transitive owl:sameAs links that may in turn be mismatched again. The result is similar to the game Chinese Whispers: watered-down sameAs semantics step-by-step lead to a terrible end-to-end quality of joins. We develop innovative structural mechanisms on top of instance matching systems to significantly improve query processing avoiding Chinese Whispers.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信