Query optimization for ontology-based information integration

Yingjie Li, J. Heflin
{"title":"Query optimization for ontology-based information integration","authors":"Yingjie Li, J. Heflin","doi":"10.1145/1871437.1871623","DOIUrl":null,"url":null,"abstract":"In recent years, there has been an explosion of publicly available RDF and OWL data sources. In order to effectively and quickly answer queries in such an environment, we present an approach to identifying the potentially relevant Semantic Web data sources using query rewritings and a term index. We demonstrate that such an approach must carefully handle query goals that lack constants; otherwise the algorithm may identify many sources that do not contribute to eventual answers. This is because the term index only indicates if URIs are present in a document, and specific answers to a subgoal cannot be calculated until the source is physically accessed - an expensive operation given disk/network latency. We present an algorithm that, given a set of query rewritings that accounts for ontology heterogeneity, incrementally selects and processes sources in order to maintain selectivity. Once sources are selected, we use an OWL reasoner to answer queries over these sources and their corresponding ontologies. We present the results of experiments using both a synthetic data set and a subset of the real-world Billion Triple Challenge data.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM international conference on Information and knowledge management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1871437.1871623","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

In recent years, there has been an explosion of publicly available RDF and OWL data sources. In order to effectively and quickly answer queries in such an environment, we present an approach to identifying the potentially relevant Semantic Web data sources using query rewritings and a term index. We demonstrate that such an approach must carefully handle query goals that lack constants; otherwise the algorithm may identify many sources that do not contribute to eventual answers. This is because the term index only indicates if URIs are present in a document, and specific answers to a subgoal cannot be calculated until the source is physically accessed - an expensive operation given disk/network latency. We present an algorithm that, given a set of query rewritings that accounts for ontology heterogeneity, incrementally selects and processes sources in order to maintain selectivity. Once sources are selected, we use an OWL reasoner to answer queries over these sources and their corresponding ontologies. We present the results of experiments using both a synthetic data set and a subset of the real-world Billion Triple Challenge data.
基于本体的信息集成查询优化
近年来,公开可用的RDF和OWL数据源出现了爆炸式增长。为了在这样的环境中有效和快速地回答查询,我们提出了一种使用查询重写和术语索引来识别潜在相关语义Web数据源的方法。我们演示了这种方法必须小心地处理缺少常量的查询目标;否则,算法可能会识别出许多对最终答案没有贡献的来源。这是因为术语索引仅指示文档中是否存在uri,并且在物理访问源之前无法计算子目标的特定答案—在磁盘/网络延迟的情况下,这是一项昂贵的操作。我们提出了一种算法,给定一组考虑本体异质性的查询重写,增量地选择和处理源以保持选择性。一旦选择了源,我们使用OWL推理器来回答对这些源及其相应本体的查询。我们展示了使用合成数据集和真实世界十亿三重挑战数据子集的实验结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信