Answering complex structured queries over the deep web

Fan Wang, G. Agrawal
{"title":"Answering complex structured queries over the deep web","authors":"Fan Wang, G. Agrawal","doi":"10.1145/2076623.2076638","DOIUrl":null,"url":null,"abstract":"A large part of the data on the World Wide Web resides in the deep web. Most deep web data sources only support simple text interfaces for querying them, which are easy to use but have limited expressive power. Therefore, processing complex structured queries over the deep web currently involves a large amount of manual work. Our work focuses on addressing the existing gap between users' need of expressing and executing complex structured queries over the deep web, and the simple and limited input interfaces of the existing deep web data sources.\n This paper presents a query planning problem formulation, with novel algorithms and optimizations, for enabling a high-level and highly expressive query language to be supported over deep web data sources. We particularly target three types of complex queries, which are select-project-join queries, aggregation queries, and nested queries. We have developed query planning algorithms to generate query plans for each of these, and propose several optimization techniques to further speedup query plan execution.\n In our experiments, we show our algorithm has good scalability and furthermore, for over 90% of the experimental queries, the execution time and result quality of the query plans generated by our algorithms are very close to the optimal plans generated by an exhaustive search algorithm. Furthermore, our optimization techniques outperform an existing optimization method in terms of both reduction in transmitted data records and query execution speedups.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"16 1","pages":"115-123"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Database Engineering and Applications Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2076623.2076638","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

A large part of the data on the World Wide Web resides in the deep web. Most deep web data sources only support simple text interfaces for querying them, which are easy to use but have limited expressive power. Therefore, processing complex structured queries over the deep web currently involves a large amount of manual work. Our work focuses on addressing the existing gap between users' need of expressing and executing complex structured queries over the deep web, and the simple and limited input interfaces of the existing deep web data sources. This paper presents a query planning problem formulation, with novel algorithms and optimizations, for enabling a high-level and highly expressive query language to be supported over deep web data sources. We particularly target three types of complex queries, which are select-project-join queries, aggregation queries, and nested queries. We have developed query planning algorithms to generate query plans for each of these, and propose several optimization techniques to further speedup query plan execution. In our experiments, we show our algorithm has good scalability and furthermore, for over 90% of the experimental queries, the execution time and result quality of the query plans generated by our algorithms are very close to the optimal plans generated by an exhaustive search algorithm. Furthermore, our optimization techniques outperform an existing optimization method in terms of both reduction in transmitted data records and query execution speedups.
在深度网络上回答复杂的结构化查询
万维网上的大部分数据驻留在深网中。大多数深层网络数据源只支持简单的文本界面来查询它们,这些界面易于使用,但表达能力有限。因此,在深度网络上处理复杂的结构化查询目前涉及大量的手工工作。我们的工作重点是解决用户在深度网络上表达和执行复杂结构化查询的需求与现有深度网络数据源的简单和有限输入接口之间的现有差距。本文提出了一种查询规划问题公式,采用新颖的算法和优化,以支持深度网络数据源上的高级和高表现力的查询语言。我们特别针对三种类型的复杂查询,它们是选择项目连接查询、聚合查询和嵌套查询。我们已经开发了查询规划算法来为每个查询生成查询计划,并提出了几种优化技术来进一步加速查询计划的执行。在实验中,我们证明了我们的算法具有良好的可扩展性,并且对于90%以上的实验查询,我们的算法生成的查询计划的执行时间和结果质量非常接近穷举搜索算法生成的最优计划。此外,我们的优化技术在减少传输数据记录和提高查询执行速度方面优于现有的优化方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信