Efficient Selection and Integration of Data Sources for Answering Semantic Web Queries

A. Qasem, Dimitre A. Dimitrov, J. Heflin
{"title":"Efficient Selection and Integration of Data Sources for Answering Semantic Web Queries","authors":"A. Qasem, Dimitre A. Dimitrov, J. Heflin","doi":"10.1109/ICSC.2008.31","DOIUrl":null,"url":null,"abstract":"In this work we adapt an efficient information integration algorithm to identify the minimal set of potentially relevant Semantic Web data sources for a given query. The vast majority of these sources are files written in RDF or OWL format, and must be processed in their entirety. Our adaptation includes enhancing the algorithm with taxonomic reasoning, defining and using a mapping language for the purpose of aligning heterogeneous Semantic Web ontologies, and introducing a concept of source relevance to reduce the number of sources that we need to consider for a given query. After the source selection process, we load the selected sources into a Semantic Web reasoner to get a sound and complete answer to the query. We have conducted an experiment using synthetic ontologies and data sources which demonstrates that our system performs well over a wide range of queries. A typical response time for a substantial work load of 50 domain ontologies, 80 map ontologies and 500 data sources is less than 2 seconds. Furthermore,our system returned correct answers to 200 randomly generated queries in several workload configurations. We have also compared our adaptation with a basic implementation of the original information integration algorithm that does not do any taxonomic reasoning. In the most complex configuration with 50 domain ontologies, 100 map ontologies and 1000 data sources our system returns complete answers to all the queries whereas the basic implementation returns complete answers to only 28% of the queries.","PeriodicalId":102805,"journal":{"name":"2008 IEEE International Conference on Semantic Computing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Semantic Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSC.2008.31","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

In this work we adapt an efficient information integration algorithm to identify the minimal set of potentially relevant Semantic Web data sources for a given query. The vast majority of these sources are files written in RDF or OWL format, and must be processed in their entirety. Our adaptation includes enhancing the algorithm with taxonomic reasoning, defining and using a mapping language for the purpose of aligning heterogeneous Semantic Web ontologies, and introducing a concept of source relevance to reduce the number of sources that we need to consider for a given query. After the source selection process, we load the selected sources into a Semantic Web reasoner to get a sound and complete answer to the query. We have conducted an experiment using synthetic ontologies and data sources which demonstrates that our system performs well over a wide range of queries. A typical response time for a substantial work load of 50 domain ontologies, 80 map ontologies and 500 data sources is less than 2 seconds. Furthermore,our system returned correct answers to 200 randomly generated queries in several workload configurations. We have also compared our adaptation with a basic implementation of the original information integration algorithm that does not do any taxonomic reasoning. In the most complex configuration with 50 domain ontologies, 100 map ontologies and 1000 data sources our system returns complete answers to all the queries whereas the basic implementation returns complete answers to only 28% of the queries.
回答语义Web查询的有效数据源选择和集成
在这项工作中,我们采用了一种有效的信息集成算法来识别给定查询的潜在相关语义Web数据源的最小集合。这些源中的绝大多数是用RDF或OWL格式编写的文件,必须完整地处理它们。我们的调整包括用分类推理增强算法,定义和使用映射语言来对齐异构语义Web本体,以及引入源相关性的概念来减少给定查询需要考虑的源的数量。在源选择过程之后,我们将选择的源加载到语义Web推理器中,以获得对查询的完整答案。我们已经使用合成本体和数据源进行了一个实验,这表明我们的系统在广泛的查询中表现良好。对于50个领域本体、80个地图本体和500个数据源的大量工作负载,典型的响应时间小于2秒。此外,我们的系统在多个工作负载配置中为200个随机生成的查询返回正确答案。我们还将我们的自适应与原始信息集成算法的基本实现进行了比较,该算法不做任何分类推理。在包含50个域本体、100个映射本体和1000个数据源的最复杂配置中,我们的系统对所有查询返回完整答案,而基本实现只对28%的查询返回完整答案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信